chapter 7 (part 3) neural networks. what are neural networks? an extremely simplified version of the...

42
Chapter 7 (part 3) Neural Networks

Upload: kristina-conley

Post on 23-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Chapter 7 (part 3)Neural Networks

Page 2: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

What are Neural Networks?• An extremely simplified version of

the brain• Essentially a function approximator

Transform inputs into outputs to the best of its ability

inputs outputs

inputs outputsNN

Page 3: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

What are Neural Networks?

Page 4: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

What are Neural Networks?• For the most part, neural networks are useful in

situations where multiple regression is used.• In a neural network

• the independent variables are called input cells

• the dependent variable is called an output cell (depending on the network model there might be more than one output).

• As in regression there is a certain number of observations (say N).

• Each observation contains a value for each input cell (or independent variable) and the output cell (or dependent variable).

Page 5: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

In an illustration of neural network, each circle is a cell of the network. There are three different types of cells:

• Input cells are the cells in the first column of each network. The first column of cells in the network is called the Input Layer.• Output cell is the cell in the last column of the network and the last layer is the Output Layer of the network.• All other layers of the network are called Hidden Layers.

Page 6: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

• Cell 0 may be viewed as a special input cell, analogous to the constant term in multiple regression.

• Each other Input cell in the input layer corresponds to an independent variable.

• Each cell in the Output layer represents a dependent variable we want to predict.

• The arc in the network joining cells i and j has an associated weight wij.

Page 7: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

0

1

2

3

4

7

5

6

8

The figure above has 4 input variables and one hidden layer. With the exception of cell 0, each cell in the network is connected by an arc to each cell in the next network layer. There is an arc connecting cell 0 with each cell in a Hidden or Output Layer.

Input LayerHidden Layer

Output Layer

Page 8: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

INPUT and OUTPUT VALUES For any observation, each cell in the network has an associated input value and output value.

For any observation, each input cell (other than cell 0):

has an input value equal to the value of input i for that observation;has an output value equal to the value of input i for that observation.

For example, if the first input equals 3, then the input and output from node 1 are both 3.

Page 9: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

For cell 0, the input and output values always equal 1.

Let INP(j) = input to cell j and OUT(j) = output from cell j.

For now we suppress the dependence of INP(j) and OUT(j) on the particular observation.

For any cell not in the Input Layer( ) (output from cell )ij

all i

INJ j w i

Page 10: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

The following equations illustrate the use of the previous formula by computing the inputs to cells 5-7 of the previous neural network figure. Suppose that for a given observation input cell j takes the value Ij. Then:

INP(5) = w05(1) + w15I1 + w25I2 + w35I3 + w45I4INP(6) = w06(1) + w16I1 + w26I2 + w36I3 + w46I4INP(7) = w07(1) + w17I1 + w27I2 + w37I3 + w47I4 0

1

2

3

4

7

5

6

8

Input Layer Output LayerHidden Layer

Page 11: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

To determine the output from any cell not in the input layer we need to use a transfer function. We will later discuss the most commonly used transfer functions, but for now the transfer function f may stand for any function. Then for any cell j not in the Input Layer

OUT(j) = f(INP(j))

For our figure, this formula yieldsOUT(5) = f(INP(5))OUT(6) = f(INP(6))OUT(7) = f(INP(7))

Page 12: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

By using the formulas and the previous results, we can now determine the input and output for our output cell, cell 8.

INP(8)=w08(1)+w58OUT(5)+w68OUT(6)+

78OUT(7)

OUT(8) = f(INP(8))

For any observation, OUT(8) is our prediction for the output cell.

Page 13: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

The complex math involved in neural networks is used to compute weights that produce "good" predictions.

More formally, let

OUTj(8) = output of cell 8 for observation j

Oj = actual value of cell 8 for observation j.

The real work in the neural network analysis is to determine network weights wij that minimize 2

1

( (8) )N

j jj

OUT O

Page 14: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

SCALING DATA

For the computations in a neural net to be tractable it is desirable to scale all data (inputs and outputs) so that all inputs lie in either of the following two intervals:

Interval 1: [0, 1] Interval 2: [-1, +1]

Page 15: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

If you want to scale your data to lie on [0, 1] then any value x for Input i should be transformed into

Note: Range = Largest value for Input i – Smallest value for input i

and any value x for the output should be transformed into

(smallest value for input )

range for input

x i

i

(smallest value of output)

range for output

x

Page 16: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

If you want to scale your data to lie on [-1 +1] then any value x for input i should be transformed into  If you want to scale your data to lie on [-1 +1] then any value x for the output should be transformed into 

2[ (largest value for input )]1

range for input

x i

i

2[ (largest value for output)]1

range for output

x

Page 17: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

THE SIGMOID TRANSFER FUNCTIONVast experience with neural networks indicates that the sigmoid transfer function usually yields the best predictions.If your data has been scaled to lie on [-1, +1] the relevant sigmoid function is given by

1

1

2)(

inpute

inputf

Note that for this function an input near - yields an output near -1 and an input near + yields an output near +1.

Page 18: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

If your data has been scaled to lie on [0 1] the relevant sigmoid function is given by

Note that for this function an input near - yields an output near 0 and an input near + yields an output near 1.

inputinput

inputinput

ee

eeinputf

)(

Page 19: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

REMARKS1. The sigmoid function is often called the squashing function because it "squashes" values on the interval [-, +] to the unit interval [0 1].2. The slope of the sigmoid function for the [0 1] interval is given by

f'(Input) = f(Input)(1-f(Input))

This implies that the sigmoid function is very steep for intermediate values of the input and very flat for extreme input values.

Page 20: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

TESTING AND VALIDATION

When we fit a regression to data, we often use 80%-90% of the data to fit a regression equation, and use the remaining data to "validate" the equation.

The same technique is used in neural networks. We begin by designating 80%-90% of our data as the training or learning data set. Then we "fit" a neural network to this data.

Page 21: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Suppose cell 8 is the output cell.

Let Oj(8) = the actual output for observation j

And

OUTj(8) be the value of the output cell for observation j.

Let AVGOT(8) = average value of the output for the training data.

Page 22: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Analogous to regression define;

SST(Train) = Sum of Squares Total for Training data SST(Train) = (Oj(8) - AVGOT(8))2

and

SSR(Train) = (OUTj(8) - AVGOT(8))2

Define;

R2(Train) = SSR(Train)/SST(Train)

Page 23: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

If the network is to be useful for forecasting, the R2 computed from the Test portion of the data should be close to R2(Train).

Page 24: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

CONTINUOUS AND BINARY DATA

If your dependent variable assumes only two values (say 0 and 1) we say we have binary data.

In this case the usual procedure is to try and train the network until as many outputs as possible are below 0.1 and above 0.9.Then those observations with predictions less than .1 are classified as 0 and those observations with predictions larger than .9 are classified as 1.

If we do not have binary data we say that the data is continuous.

Page 25: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Examples of the Use of Neural Networks

Example 1: The efficient market hypothesis of financial markets states that the "past history" of a stock's returns yields no information about the future return of the stock. White (1988) examines returns on IBM to see if the market is efficient. He begins by running a multiple regression where the dependent variable is the next day's return on IBM stock and the five independent variables are the return on IBM during each of the last five days.

Page 26: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

This regression yielded R2 = .0079, which is consistent with the efficient market hypothesis.White then ran a neural network (containing one hidden layer) with the output cell corresponding to the next day's return on IBM and 5 input cells corresponding to the last five days' return on IBM. This neural network yielded R2 = .179. This implies that the past five days of IBM returns do contain information that can be used to make predictions about tomorrow's return on IBM! According to the October 9, 1993 Economist, Fidelity manages 2.6 billion dollars in assets using neural nets. One of the neural net funds has beat the S&P 500 by 2-7% a quarter for more than three years.

Page 27: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Example 2: Researchers at Carnegie-Mellon University have developed ALVINN 1 (Automated Land Vehicle in a Neural Network), a neural network that can drive a car! It can tell if cars are nearby and then slow down the car. Within ten years a neural network may be driving your car!

Page 28: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Example 3: In finance and accounting it is important to accurately predict whether or not a company will go bankrupt during the next year. Altman(1968) developed a method (Altman's Z-statistic) to predict whether or not a firm will go bankrupt during the next year based on the firm's financial ratios. This method used a version of regression called discriminant analysis. Neural networks using financial ratios as input cells have outperformed Altman's Z.

Page 29: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Example 4: The September 22, 1993 New York Times reported that Otis Elevator uses neural networks to direct elevators. For example, if elevator 1 is on floor 10 and going up, elevator 2 is on floor 6 going down, and elevator 3 is on floor 2 and going up, which elevator should answer a call to go down from floor 7?Example 5: Many banks (Mellon and Chase are two examples) and credit card companies use neural networks to predict (on the basis of past usage patterns) whether or not a credit card transaction should be disallowed. AVCO Financial used a neural net to determine whether or not to lend people money. They increased their loan volume by 25% and decreased their default rate by 20%!

Page 30: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Example 6: "Pen" computers and personal digital assistants often use neural nets to read the user's handwriting. The "inputs" to the network are a binary representation of what the user has written. For example, let a 1 = place where we have written something and a 0 = a place where we have not written something. An input to the network might look like00011110001000000111000000l00001110

Page 31: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

The neural net must decide to classify this input as an "s", a "5" or something else.

Lecun tried to have a neural net "read" handwritten zip code digits. 7291 digits were used for training and 2007 for testing. Running the neural net took three days on a Sun workstation. The net correctly classified 99.86% of the Training data and 95.0% of the Test data.

Page 32: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Why Neural Networks Can Beat Regression: The XOR

ExampleThe classical XOR data set can be used to obtain a better understanding of how neural networks work, and why they can pick up patterns that regression often misses. The XOR data set also illustrates the usefulness of a hidden layer. The XOR data set contains two inputs, one output, and four observations.

Page 33: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

The data set is given below:

Observation Input 1 Input 2 Output

1 0 0 0

2 1 0 1

3 0 1 1

4 1 1 0

We see that the output equals 1 if either input (but not both) is equal to 1.If we try to use regression to predict the output from the two inputs, we obtain the equation = .5. This equation yields an R2 = 0, which means that linear multiple regression yields poor predictions indeed.

Page 34: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Now let's use the neural network of the following figure to predict the output.

2

0

1 3

We assume that for some θ the transfer function for the output cell is defined by f(x) = 1 if x and f(x) = 0 if x<. Given this transfer function, it is natural to ask whether there are any values for and the wij's that will enable the above

network to make the correct predictions for the data set in the table in previous slide.

Page 35: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

From the figure in previous slide we find

INP(3) = w03 + w13(Input 1) + w23(Input 2)andOUT(3) = 1 if INP(3) and OUT(3) = 0 if INP(3)<

This implies that this figure will yield correct predictions for each observation if and only if the following four inequalities hold:

Observation 1: w03<Observation 2: w03 + w13Observation 3: w03 + w23Observation 4: w03 + w13 + w23<

Page 36: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

There are no values of w03, w13, w23 and that satisfy the following inequalities:Observation 1: w03<Observation 4: w03 + w13 + w23<

To see this note that togetherObservation 3: w03 + w23Observation 4: w03 + w13 + w23<imply w13<0.

Adding w13<0 to Observation 1: w03<implies that w03 + w13< θ, which contradicts the following inequality

Observation 2: w03 + w13 Thus we have seen that there is no way the neural network of this figure can correctly predict the output for each observation.

Page 37: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Suppose we add a hidden layer with two nodes to the previous figure’s neural net.This yields the neural network in the following figure.

0

1

2

4

3

5

We define the transfer function fi at cell i by fi(x) = 1 if xi and fi(x) = 0 for x<i. If

we choose 3 = .4, 4 = 1.2, 5 = .5, w03 = w04 = w05 = 0, w13 = w14 = w23 = w24 = 1,

w35 = .6, w45 = -.2, then the neural network for the figure above will yield correct

predictions for each observation. We now verify that this is the case.

Page 38: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Observation 1:

Input 1 = Input 2 = 0, Output = 0Cell 3 Input = 1(0) + 1(0) = 0Cell 3 Output = 0 (0<.4)Cell 4 Input = 1(0) + 1(0) = 0Cell 4 Output = 0 (0<1.2)Cell 5 Input = .6(0) - .2(0) = 0Cell 5 Output = 0 (0<.5)

Page 39: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Observation 2:

Input 1 = 1, Input 2 = 0, Output = 1Node 3 Input = 1(1) + 1(0) = 1Node 3 Output = 1 (1.4)Node 4 Input = 1(1) + 1(0) = 1Node 4 Output = 0 (1<1.2)Node 5 Input = .6(1) -.2(0) = .6Node 5 Output = 1 (.6.5)

Page 40: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Observation 3:

Input 1 = 0 , Input 2 = 1, Output = 1Node 3 Input = 1(0) + 1(1) = 1Node 3 Output = 1 (1.4)Node 4 Input = 1(0) + 1(1) = 1Node 4 Output = 0 (1<1.2)Node 5 Input = .6(1) - .2(0) = .6Node 5 Output = 1 (.6.5)

Page 41: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

Observation 4:

Input 1 = Input 2 = 1, Output = 0Node 3 Input = 1(1) + 1(1) = 2Node 3 Output = 1 (2.4)Node 4 Input = 1(1) + 1(1) = 2Node 4 Output = 1 (21.2)Node 5 Input = .6(1) - .2(1) = .4Node 5 Output = 0 (.4<.5)

We have found that the hidden layer enables us to perfectly fit the XOR data set!

Page 42: Chapter 7 (part 3) Neural Networks. What are Neural Networks? An extremely simplified version of the brain Essentially a function approximator  Transform

REMARK

One might think that having more hidden layers will lead to much more accurate predictions. Vast experience shows that more than one hidden layer is rarely a significant improvement over a single hidden layer.