c ee 6 9 6 d e e p l e a r ning in c ee a nd ea r th sc ie...

12
CEE 696 Deep Learning in CEE and Earth Science HW1/Brief History of Neural Nets/Misc. 9/24/2019 Harry Lee https://www2.hawaii.edu/~jonghyun/classes/F19/CEE696/schedule.html

Upload: others

Post on 27-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: C EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie ...jonghyun/classes/F19/CEE696/files/slides07.pdfC EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie nc e HW1/Brief

CEE 696 Deep Learning in CEE and Earth ScienceHW1/Brief History of Neural Nets/Misc.

9/24/2019

Harry Lee

https://www2.hawaii.edu/~jonghyun/classes/F19/CEE696/schedule.html

Page 2: C EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie ...jonghyun/classes/F19/CEE696/files/slides07.pdfC EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie nc e HW1/Brief

HW1 CommentsObjective: overfit the data as much as possible

Brief summary: students used

NN Architecture with 3 ~ 5 layers

Activation functions: sigmoid, relu, leakyrelu

32 - 512 neurons each layer

MSE: 1E-6 to 1E-2

# of parameters from 5,000 to 500,000 (2-512-512-512-1)

2

Page 3: C EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie ...jonghyun/classes/F19/CEE696/files/slides07.pdfC EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie nc e HW1/Brief

Optimizers and Epochs

You don't want to stop at 50 epochs. Monitoring the learning is important.SGD vs. Adaptive learning rate methods (Adam, RMSprop and so on)?

                             

3

Page 4: C EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie ...jonghyun/classes/F19/CEE696/files/slides07.pdfC EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie nc e HW1/Brief

Results

Underfitting vs. Overfitting - we will come back to this topic later; focus onminimization for now

4

Page 5: C EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie ...jonghyun/classes/F19/CEE696/files/slides07.pdfC EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie nc e HW1/Brief

tf.keras.backend.clear_session() may be helpful

I don't like notebooks by Joel Grus https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/edit#slide=id.g362da58057_0_1 5

Page 6: C EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie ...jonghyun/classes/F19/CEE696/files/slides07.pdfC EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie nc e HW1/Brief

HW1: ValidationWhile we overfit the data in HW1, we might want to monitor validation error by

model.fit(x, y, epochs = 1, validation_split=0.0, shuffle=True) # validation_data = (x_val,y_val) will override validation_split

validation_split : float between 0 and 1.

Fraction of the training data to be used as validation.

The model will set apart this fraction of the training data, will not train on it, andwill evaluate the loss and any model metrics on this data at the end of each epoch.The validation data is selected from the last samples in the x and y data provided,before shuffling. shuffle for mini-batch learning/optimization. Don't confuse!

Thus, you have to shuffle your data set before you use tf-keras . Alternatively,

from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, shuffle= True) 6

Page 7: C EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie ...jonghyun/classes/F19/CEE696/files/slides07.pdfC EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie nc e HW1/Brief

Extension of Vanilla DNNBefore we move on to more advanced topics, what can we do with what we havelearned? Do you think our NN model looks simple and limited?

Actually we can do lots of interesting tests with our basic DNN. For example:

Q: How can we learn without outputs/labels, i.e., unsupervisedlearning?

A: Construct auto-associative NNs, i.e, train (input,input) pairs

7

Page 8: C EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie ...jonghyun/classes/F19/CEE696/files/slides07.pdfC EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie nc e HW1/Brief

Autoassociative/self-associative NNsNNs can reconstruct input images (Input -> model -> Input).

Copy this script to your drive                                   

8

Page 9: C EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie ...jonghyun/classes/F19/CEE696/files/slides07.pdfC EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie nc e HW1/Brief

Autoassociative NNsCopy this script to your drive and run it.

Properties of NNs similar to human memorycontent addressable/autoassociative memorypattern recognition with incomplete information: generalization

model performance less affected by deleting (even large number of) neurons:graceful degradation

9

Page 10: C EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie ...jonghyun/classes/F19/CEE696/files/slides07.pdfC EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie nc e HW1/Brief

Autoassociative NNs: Autoencoder

By introducing latent variables in the middle of hidden layers, we can perform featureselection or construct a generative model.

10

Page 11: C EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie ...jonghyun/classes/F19/CEE696/files/slides07.pdfC EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie nc e HW1/Brief

History of NNsMcCulloch & Pitts, "A Logical Calculus of Ideas Immanent in Nervous Activity",1943Rosenblatt, Perceptron, 1962

Minsky and Papert, 1969: perceptron cannot learn unless a problem is simple (i.e.,linear separable).The First NN Winter (1969 ~ 1980): Only a few scientists worked on NNsProgression (1980 ~ 1990): Hopfield Net 1982, Boltzmaan Machine 1985,Backpropagation 1986The Second NN Winter (1993 ~ 2000s): Support Vector Machine 1995, GraphicalModels

Progression (2006 - ): Deep Learning, Convolutional Networks GPUsSo what's the next? 11

Page 12: C EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie ...jonghyun/classes/F19/CEE696/files/slides07.pdfC EE 6 9 6 D e e p L e a r ning in C EE a nd Ea r th Sc ie nc e HW1/Brief

adapted from http://www.asimovinstitute.org/neural-network-zoo/ 12