loss function - rt-rk.uns.ac.rs · dragan samardzija wireless research laboratory bell...
TRANSCRIPT
-
Loss Function-Quick Notes-
Dragan Samardzija
January 2020
1
-
References
1. Wikipedia
2. Data Science: Deep Learning in Python
https://www.youtube.com/watch?v=XeQBsidyhWE
3. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
https://www.youtube.com/watch?v=ErfnhcEV1O8
4. Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville
2
https://www.youtube.com/watch?v=XeQBsidyhWEhttps://www.youtube.com/watch?v=ErfnhcEV1O8
-
Likelihood Interpretation
Information Theory Interpretation
3
-
Square Error Loss FunctionMinimize
-
Likelihood – Gaussian AssumptionMaximize
The same answer since log() monotonically increasing function.
-
Cross Entropy Loss Function – Binary ClassificationMinimize
-
LikelihoodMaximize
The same answer since log() monotonically increasing function.
-
Illustration
-
Likelihood Interpretation
Information Theory Interpretation
9
-
Number of Bits Needed to Encode • Information entropy is the average bit rate at which information is
produced by a stochastic source of data.
Claude Shannon Ludwig Boltzmann
-
Number of Bits when Mismatched
• Cross entropy between two probability distributions p and qmeasures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution q, rather than the true distribution p.
• Minimal cross entropy is achieved when the p and q distributions are identical, i.e., when cross entropy becomes entropy.