pycon2017
TRANSCRIPT
Edward
2017-09-09@ PyConJP 2017
Yuta Kashino ( )
BakFoo, Inc. CEO Astro Physics /Observational Cosmology Zope / Python Realtime Data Platform for Enterprise / Prototyping
Yuta Kashino ( )
arXiv
PyCon2015
Python
PyCon2016
PyCon2017 DNN PPL Edward @yutakashino
-
- Edward
Edward
http://bayesiandeeplearning.org/
Shakir Mohamed
http://blog.shakirm.com/wp-content/uploads/2015/11/CSML_BayesDeep.pdf
-
Denker, Schwartz, Wittner, Solla, Howard, Jackel, Hopfield (1987) Denker and LeCun (1991) MacKay (1992) Hinton and van Camp (1993) Neal (1995) Barber and Bishop (1998) Graves (2011) Blundell, Cornebise, Kavukcuoglu, and Wierstra (2015) Hernandez-Lobato and Adam (2015)
-
Yarin Gal Zoubin Ghahramani Shakir Mohamed Dastin Tran Rajesh Ranganath David Blei Ian Goodfellow
Columbia U
U of Cambridge
-
- :
- :
- :
- :
- : SGD + BackProp
…
…x1 x2 xd
✓(2)
✓(1)
x
y
y
(n) =X
j
✓
(2)j �(
X
i
✓
(1)ji x
(n)i ) + ✏
(n)
p(y(n) | x(n),✓) = �(
X
i
✓
(n)i x
(n)i )
✓
D = {x(n), y(n)}Nn=1 = (X,y)
: - +
- 2012 ILSVRC
→ 2015
-
-
-
-
: -
- ReLU, DropOut, Mini Batch, SGD(Adam), LSTM… -
- ImageNet, MSCoCo… - : GPU,
- : - Theano, Torch, Caffe, TensorFlow, Chainer, MxNet, PyTorch…
: -
-
-
-
-
https://lossfunctions.tumblr.com/
: -
-
-
- Adversarial examples -
-
=
=
-
-
- :
- :
- :
- :
- : SGD + BackProp
…
…x1 x2 xd
✓(2)
✓(1)
x
y
y
(n) =X
j
✓
(2)j �(
X
i
✓
(1)ji x
(n)i ) + ✏
(n)
p(y(n) | x(n),✓) = �(
X
i
✓
(n)i x
(n)i )
✓
D = {x(n), y(n)}Nn=1 = (X,y)
1. →
2. → DropOut
✓
1.
- data hypothesis( )
- :
-
-
P (H | D) =P (H)P (D | H)PH P (H)P (D|H)
P (x) =X
y
P (x, y)
P (x, y) = P (x)P (y | x)
posterior likelihoodprior
evidence
1. - :
- :
-
-
- :
P (H | D) =P (H)P (D | H)PH P (H)P (D|H)
likelihood priorposterior
P (✓ | D,m) =P (D | ✓,m)P (✓ | m)
P (D | m)m:
P (x | D,m) =
ZP (x | ✓,D,m)P (✓ | D,m)d✓
P (m | D) =P (D | m)P (m)
P (D)
evidence
✓ ⇠ Beta(✓ | 2, 2)
1.
-
- :
- :
- :
…
…x1 x2 xd
✓(2)
✓(1)
x
y
✓
D = {x(n), y(n)}Nn=1 = (X,y)
P (✓ | D,m) =P (D | ✓,m)P (✓ | m)
P (D | m)m:
1.
- (MCMC)
- (Variational Inference)
P (✓ | D,m) =P (D | ✓,m)P (✓ | m)
P (D | m) ZP (D | ✓,m)P (✓)d✓
evidence
1.
-
-
P (✓ | D,m) = P (D | ✓,m)P (✓ | m)
liklihood priorposterior
✓
https://github.com/dfm/corner.py
1.
✓
http://twiecki.github.io/blog/2014/01/02/visualizing-mcmc/
NUTS (HMC)Metropolis -Hastings
1.
P(θ|D,m) KL q(θ)
ELBO
1.
�⇤= argmin�KL(q(✓;�) || p(✓ | D))
= argmin�Eq(✓;�)[logq(✓;�)� p(✓ | D)]
ELBO(�) = Eq(✓;�)[p(✓,D)� logq(✓;�)]
�⇤ = argmax�ELBO(�)
P (✓ | D,m) =P (D | ✓,m)P (✓ | m)
P (D | m)
1. - KL =ELBO
- P q
q(✓;�1)
q(✓;�5)
p(✓,D) p(✓,D)
✓✓
�⇤ = argmax�ELBO(�)
ELBO(�) = Eq(✓;�)[p(✓,D)� logq(✓;�)]
1. - P q
- :
- ADVI: Automatic Differentiation Variational Inference - BBVI: Blackbox Variational Inference
arxiv:1603.00788
arxiv:1401.0118
https://github.com/HIPS/autograd/blob/master/examples/bayesian_neural_net.py
1. - VI
-
- David MacKay “Lecture 14 of the Cambridge Course” - PRML 10 http://www.inference.org.uk/itprnn_lectures/
1. Reference- Zoubin Ghahramani “History of Bayesian neural networks” NIPS 2016 Workshop Bayesian Deep Learning - Yarin Gal “Bayesian Deep Learning"O'Reilly Artificial Intelligence in New York, 2017
2.
-
- :
- :
- :
- :
- : SGD + BackProp
…
…x1 x2 xd
✓(2)
✓(1)
x
y
y
(n) =X
j
✓
(2)j �(
X
i
✓
(1)ji x
(n)i ) + ✏
(n)
p(y(n) | x(n),✓) = �(
X
i
✓
(n)i x
(n)i )
✓
D = {x(n), y(n)}Nn=1 = (X,y)
Dropout
2.Dropout- Yarin Gal ”Uncertainty in Deep Learning” - Dropout
- Dropout : conv
- LeNet with Dropout
http://mlg.eng.cam.ac.uk/yarin/blog_2248.html
2.Dropout- LeNet DNN
- conv Dropout MNIST
2.Dropout- CO2
- :
- :
- :
- :
- (MCMC)
- (Variational Inference)
…
…x1 x2 xd
✓(2)
✓(1)
x
y
✓
D = {x(n), y(n)}Nn=1 = (X,y)
P (✓ | D,m) =P (D | ✓,m)P (✓ | m)
P (D | m)
Edward
Edward- Dustin Tran (Open AI)
- Blei Lab
- (PPL)
- 2016 2 PPL
- / TensorFlow
- George Edward Pelham BoxBox-Cox Trans., Box-Jenkins, Ljung-Box test box plot Tukey,
3 2 RA Fisher
- Probabilistic Programing Library/Langage - Stan, PyMC3, Anglican, Church, Venture,Figaro, WebPPL, Edward
- : Edward / PyMC3
- (VI)
Metropolis Hastings Hamilton Monte Carlo Stochastic Gradient Langevin Dynamics No-U-Turn Sampler
Blackbox Variational Inference Automatic Differentiation Variational Inference
PPL
Edward
TensorFlow(TF) + (PPL)
TF:
PPL: + +
PPL
Edward
Edward TensorFlow
1. TF: -
- :
1. TF:
1. TF: -
-
- GPU / TPU
Inception v3 Inception v4
# of parameters: 42,679,816
# of layers: 48
1. TF: - Keras, Slim
- TensorBoard
1. TF: -
- tf.contrib.distributions
2. x:
edward
x
⇤ s P (x | ↵)
✓⇤ ⇠ Beta(✓ | 1, 1)
2. - ( )
Edward
p(x, ✓) = Beta(✓ | 1, 1)50Y
n=1
Bernoulli(xn | ✓),
2.
-
log_prob()
-
mean()
-
sample()
- tf.contrib.distributions 51 : https://www.tensorflow.org/api_docs/python/tf/contrib/distributions
3. Edward TF
3.
Wh Wh
Wx
Wx
bh bh
xtxt�1
ht�1 ht
Wy Wy
by byyt�1 yt
h
t
= tanh(W
h
h
t�1 +W
x
x
t
+ b
h
)
y
t
⇠ Normal(W
y
h
t
+ b
y
, 1).
3. http://edwardlib.org/tutorials/
4.
- :
- :
- :
- :
- (MCMC)
- (Variational Inference)
…
…x1 x2 xd
✓(2)
✓(1)
x
y
✓
D = {x(n), y(n)}Nn=1 = (X,y)
P (✓ | D,m) =P (D | ✓,m)P (✓ | m)
P (D | m)
4.
Edward MCMC
4. : MCMC
Edward : KLqp
4. :
5. Box’s loopGeorge Edward Pelham Box
Blei 2014
5. Box’s loop
Edward- Edward = TensorFlow +
+ +
- TensorFlow
-
- TF GPU, TPU, TensorBoard, Keras
-
- TensorFlow
Refrence•D. Tran, A. Kucukelbir, A. Dieng, M. Rudolph, D. Liang, and
D.M. Blei. Edward: A library for probabilistic modeling, inference, and criticism.(arXiv preprint arXiv:1610.09787)
•D. Tran, M.D. Hoffman, R.A. Saurous, E. Brevdo, K. Murphy, and D.M. Blei. Deep probabilistic programming.(arXiv preprint arXiv:1701.03757)
•Box, G. E. (1976). Science and statistics. (Journal of the American Statistical Association, 71(356), 791–799.)
•D.M. Blei. Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models. (Annual Review of Statistics and Its Application Volume 1, 2014)
-
- Edward
Edward
BakFoo, Inc.NHK NMAPS: +
BakFoo, Inc.PyConJP 2015
Python
BakFoo, Inc.
BakFoo, Inc.: SNS +
3.
256 28*28