csce 636 neural networks (deep learning)what a neural network does: learn a function x neural...
TRANSCRIPT
CSCE636NeuralNetworks(DeepLearning)
Lecture2:Mathematical BuildingBlocksofNeuralNetworks
Anxiao (Andrew)Jiang
Chapter2
Beforewebegin:themathematicalbuildingblocksofneuralnetworks
Whataneuralnetworkdoes:learnafunction
NeuralNetworkx
valueoff(x)
Theneuralnetworklearnsthefunctionf(x),eitherexactlyorapproximately.
Application:HandwrittenDigitRecognition
NeuralNetwork 4
Task:Classifygrayscaleimagesofhandwrittendigits(28x28pixels)intotheir10categories(0through9).
Howtostart?
Step1:Loadthedataset
MNISTDataset:60,000 trainingimagesand10,000 testimages,alongwiththeirlabels.
Step1:Loadthedataset
train_images: 60,000x28x28array,whereeachelement (pixel) isaninteger in[0,255]train_labels: vectoroflength60,000,whereeachelement (label) isaninteger in[0,9]
test_images: 10,000x28x28array,whereeachelement (pixel) isaninteger in[0,255]test_labels: vectoroflength10,000,whereeachelement (label)isaninteger in[0,9]
Rule:trainingdataandtestdataaredisjoint.Onlyusetrainingdatatotrainneuralnetwork!
4
Step2:Buildneuralnetworkarchitecture
428x282-d array
0
1
2
28x28-1 =783
0
1
2
511
512neurons
Whatisaneuron
Activation function(manypossible forms)
ReLU (mostpopularActivationfunction)
Step2:Buildneuralnetworkarchitecture
428x282-d array
0
1
2
28x28-1 =783
0
1
2
511
512neurons
0
1
2
9
10neurons
Softmax (popularactivationfunctionforthelastlayerofaclassificationnetwork)
Step2:Buildneuralnetworkarchitecture
428x282-d array
0
1
2
28x28-1 =783
0
1
2
511
512neurons
0
1
2
9
10neurons
Probability oflabel “0”
Probability oflabel “1”
Probability oflabel “2”
Probability oflabel “9”
Step3:chooselossfunction,optimizer,andtargetmetrics
Categoricalcross-entropy(apopularlossfunctionformulti-classclassification)
Numberofsamples
Numberofclasses
Trueprobability(1or0)thisinputbelongstoclassj
probabilitypredictedbyneuralnetworkthatthisinputbelongsto
classj
RMSProp (apopularoptimizer,detailstobeintroduced later)
Accuracy: fractionoftimesthattheneuralnetworkmakescorrectionpredictions• Ifwecareaboutaccuracy,whydoweoptimizecategoricalcross-entropyduringtraining?• Answer:lossfunctionneedstobedifferentiable.(Andthelossfunctioniscloselyrelatedtothetargetmetric.Minimizingthelossfunctionis(approximatelyorprecisely)optimizingthetargetmetric.
428x282-d array
0
1
2
28x28-1 =783
0
1
2
511
512neurons
0
1
2
9
10neurons
Probability oflabel “0”
Probability oflabel “1”
Probability oflabel “2”
Probability oflabel “9”
The“Teacher”:
Loss function: categoricalcross-entropyOptimizer: RMSPropTargetMetric:Accuracy
Step4:PreparetrainingandtestdataHere:Reshapeandnormalizeinputtrainingdata
train_images:Originally: 3-dimensional arrayofsize60000 x28x28,whereeach element isanintegerin [0,255]Afterreshaping: 2-dimensional arrayofsize60000 x784,whereeach element isanintegerin [0,255]Afternormalization: 2-dimensional arrayofsize60000 x784,where eachelement is arealnumber in[0,1]
28x282-d array
1-d array oflength28*28 =784
Normalize values inthearraytobetween0and 1
428x282-d array
0
1
2
0
1
2
511
512neurons
0
1
2
9
10neurons
Probability oflabel “0”
Probability oflabel “1”
Probability oflabel “2”
Probability oflabel “9”
The“Teacher”:
Loss function: categoricalcross-entropyOptimizer: RMSPropTargetMetric:Accuracy
783
0
1
2
783
“Reshape”outputtrainingdata:categoricallyencodeeachlabelusingone-hotencoding
Label One-hotencoding0 1,0,0,0,0,0,0,0,0,01 0,1,0,0,0,0,0,0,0,02 0,0,1,0,0,0,0,0,0,03 0,0,0,1,0,0,0,0,0,04 0,0,0,0,1,0,0,0,0,0
Label One-hotencoding5 0,0,0,0,0,1,0,0,0,06 0,0,0,0,0,0,1,0,0,07 0,0,0,0,0,0,0,1,0,08 0,0,0,0,0,0,0,0,1,09 0,0,0,0,0,0,0,0,0,1
428x282-d array
0
1
2
0
1
2
511
512neurons
0
1
2
9
10neurons
The“Teacher”:
Loss function: categoricalcross-entropyOptimizer: RMSPropTargetMetric:Accuracy
783
0
1
2
783
0
1
2
9
Step5:Traintheneuralnetwork
428x282-d array
0
1
2
0
1
2
511
512neurons
0
1
2
9
10neurons
The“Teacher”:Loss function, Optimizer, TargetMetric:Accuracy
783
0
1
2
783
0
1
2
9
Batchsize:thenumberofsamplestouseeachtimeforcomputingthelossfunctionandupdatingtheweights.
Epochs:thenumberoftimesthetrainingprocessusesthewholetrainingdataset.
Andsoon(totally5epochs).
Accuracyontrainingdata:97.8%
Step6:Testthetrainedneuralnetwork
Comparetotrainingaccuracy:0.989
Test accuracyis(clearly) lowerthantrainingaccuracy.Maybethere issome over-fittingtodata.
Butstill, performance isnice!
Summary
Step1:Loadthedataset
4
Step2:Buildneuralnetworkarchitecture
428x282-d array
0
1
2
28x28-1 =783
0
1
2
511
0
1
2
9
Step3:chooselossfunction,optimizer,andtargetmetrics
428x282-d array
0
1
2
28x28-1 =783
0
1
2
511
0
1
2
9
Probability oflabel “0”
Probability oflabel “1”
Probability oflabel “2”
Probability oflabel “9”
The“Teacher”:Loss function, Optimizer, TargetMetric
Step4:Preparetrainingandtestdata
428x282-d array
0
1
2
0
1
2
511
0
1
2
9783
0
1
2
783
0
1
2
9
The“Teacher”:Loss function, Optimizer, TargetMetric
Step5:Traintheneuralnetwork
428x282-d array
0
1
2
0
1
2
511
512neurons
0
1
2
9
10neurons
The“Teacher”:Loss function, Optimizer, TargetMetric:Accuracy
783
0
1
2
783
0
1
2
9
Step6:Testthetrainedneuralnetwork
HowdidIdo?
Well…
MiscellaneousBasicConcepts
Datarepresentation:Tensor(Array)
• Scalarnumbers(0-dimensionaltensors)• Vectors(1-dtensors)• Matrices(2-dtensors)• 3-dtensors,andhigher-dimensionaltensors
• Keyattributesforatensor:• (1)numberofaxes• (2)shape• (3)datatype
Somebasictensoroperations
• Addtwotesnors (ofthesameshape):element-wiseaddition• ApplyaReLU activationfunctiontoatensor:element-wiseoperation
• TensorProduct(alsocalledtensordot)
Reshapetensor
Basictermsforaneuralnetwork
• Layers:thebuildingblocksinaneuralnetwork• Model:networkoflayers• Lossfunctionandoptimizer:keystoconfiguringthelearningprocess
Keras:adeeplearninglibraryforPython
PyTorchisgettingpopulartoday
Keras:adeeplearninglibraryforPython
UseaGPUwhenpossible
Jupyter notebook:anicewaytoeditandrundeeplearningexperiments