j (2ˇ˙r j p xijxi ;˙r) = dim( )) (r ;r ˙r m;r exp xi r m r xi m;r 2...

1
The Variational Coupled Gaussian Process Dynamical Model Dmytro Velychko, Dominik Endres Theoretical Neuroscience Group, Department of Psychology Philipps-University, 35032 Marburg, Germany. [email protected], [email protected] Introduction Planning and execution of full-body movements is a hard control prob- lem. Modular movement primitives (MP) have been suggested as a means to simplify it while retaining a suffi- cient degree of control flexibility for a wide range of task [Bizzi et al., 2008]. A par- ticularly well-developed type of MP in robotics is the dynamical movement primitive (DMP) [Schaal, 2006]. But the form of the differential equation of a DMP remains fixed during learning, poten- tially reducing the representational ca- pacity. Coupled Gaussian Process Dy- namical Model (CGPDM) [Velychko et al., 2014] combines the advantages of modular- ity and flexibility in the dynamics, at least theoretically. Non-parametric GPs learning have high computational cost. We improve this by employ- ing sparse variational approximations [Titsias, 2009, Titsias and Lawrence, 2010, Frigola et al., 2014] and ob- viating the need for sampling. Coupled GPDMs model Full dynamical system comprises M parts where every part is a modular MP (here 2 for simplicity). We introduce M × M dynamics in the latent space, each of which makes prediction about a part of the latent space from the previous state of some (other) part, these predictions are combined via PoE: ε σ 2,2 σ 1,2 σ 2,1 σ 1,1 GP (k 2,1 (X 2 ,X 20 )) GP (k 1,1 (X 1 ,X 10 )) GP (k 1,2 (X 1 ,X 10 )) GP (k 2,2 (X 2 ,X 20 )) f 2,1 (X ) f 1,1 (X ) X 1 1 X 1,1 2 X 2,1 2 X 1 2 X 1,1 N X 2,1 N X 1 N f 1,2 (X ) f 2,2 (X ) X 2 1 X 1,2 2 X 2,2 2 X 2 2 X 1,2 N X 2,2 N X 2 N Product-of-Experts for Gaussian ex- perts: x r i Latent state x m,r i Partial σ m,r m =1 ...M σ 2 r = X m σ -2 m,r ! -1 (1) p(x r i |x :,r i :,r ) Q m N ( x r i |x m,r i 2 m,r ) p(x r i |x :,r i r )= exp - 1 2σ 2 r x r i -σ 2 r m x m,r i σ 2 m,r 2 (2πσ 2 r ) dim(X r ) 2 (2) Integrate out the intermediate predic- tions and get a simplified model: ε GP (k 1 ((X 1 ,X 2 ), (X 10 ,X 20 ))) GP (k 2 ((X 1 ,X 2 ), (X 10 ,X 20 ))) f 1,1 (X ) X 1 1 X 1 2 X 1 N f 2,2 (X ) X 2 1 X 2 N X 2 2 Covariance matrix of latent points x: Σ + PKP T = σ 2 r 1 (N -1)×(N -1) + σ 4 r M m=1 K m σ 4 m,r (3) Kernel function k r generating these co- variance matrices: k r (X : ,X :0 )= σ 2 r δ (X : ,X :0 )+ σ 4 r M m=1 k m,r (X m ,X m0 ) σ 4 m,r (4) σ m,r can be increased to control the cou- pling. Coupling strength: σ 2 rel(i,j ) = σ 4 j,j σ 4 i,j (5) Augmented model X 1 t-1 X 1 t-2 X 1 t X 2 t-1 X 2 t-2 X 2 t Y 1 t-1 Y 1 t-2 Y 1 t Y 2 t-2 Y 2 t-1 Y 2 t f 1 (X 1 t-2 ,X 1 t-1 ,X 2 t-2 ,X 2 t-1 ) GP (μ(.),k(.., ..)) (z 1 0 ,..., z 1 l 1 ) U 1 f 2 (X 1 t-2 ,X 1 t-1 ,X 2 t-2 ,X 2 t-1 ) GP (μ(.),k(.., ..)) (z 2 0 ,..., z 2 l 2 ) U 2 GP (μ(.),k(., .)) (r 1 0 ,..., r 1 j 1 ) V 1 g 1 (X 1 ) GP (μ(.),k(., .)) (r 2 0 ,..., r 2 j 2 ) V 2 g 2 (X 2 ) Y - observed data X - latent points r i v i and z j u j - augmenting mappings For each part: X = {x 0 ... x T }; x t R Q Y = {y 0 ... y T }; y t R D f (x -t ) ∼ GP (0,k f (x -t , x 0 -t )) f t = f (x -t ) x t ∼N (f t , I α 2 ) g d (X ) ∼ GP (0,k g (x t , x 0 t )) g d = g d (X ) Y :,d ∼N (g d , I β 2 ) p(x 0 )= N (x 0 |0, I ) Variational learning Proposal variational distributions: q (x t )= N (μ x t , S x t ). q (u) and q (v ) are uncon- strained distributions. The full variational joint proposal posterior distribution is: q (x, u, f , v , g )= p(y |g )p(g |x, v )q (v ) " T Y t=1 p(f t |f 1:t-1 , x 0:t-1 , u) # q (x)q (u) The ELBO: log p(y |θ ) ≥L(θ )= Z x,u,v ,g ,f q (x, u, f , v , g ) log p(x, u, f , v , g , y ) q (x, u, f , v , g ) (6) Separating the latent dynamics part and applying the sufficient statistics assump- tion q (x, f , u)= q (u)q (x) Q T t=1 p(f t |x t-1 , u) [Frigola et al., 2014], we get even much lower bound: L(θ ) D X d=1 Z x,v ,g d p(g d |x, v )q (x)q (v ) log p(y d |g d ) q (v ) + log p(u) q (u) du + H (q (x)) + Z q (u) " T X t=1 Z q (x t )q (x -t ) Z p(f t |x -t , u) log p(x t |f t )df t dx t dx -t # where H (q (x)) is the entropy of q (x). The first integral is given in [Titsias and Lawrence, 2010]. The remaining integral is derived below. The innermost integral: A = Z p(f t |x -t , u) log p(x t |f t )df t = log N (x t |K x -t ,z K -1 z ,z u, I α) - 1 2 tr(α -1 K x -t ,x -t - α -1 K x -t ,z K -1 z ,z K x -t ,z ) The integrals over dx t dx -t : B = Z q (x t )q (x -t )Adx t dx -t = - log(Z I α ) - 1 2 α -1 u T K -1 z ,z Ψ 2 (x -t )K -1 z ,z u + α -1 μ T x t Ψ 1 (x -t )K -1 z ,z u - 1 2 α -1 μ T x t μ x t - 1 2 tr(α -1 Ψ 0 (x -t ) - α -1 K -1 z ,z Ψ 2 (x -t )) where Z Σ = p |2π Σ| is a normalization coefficient for multivariate Gaussian with Σ covariance. Then the sum over T can be written as a quadratic form: C = T X t=1 B = - 1 2 u T F u + u T G + H = - 1 2 (u -F -1 G ) T F (u -F -1 G )+ 1 2 G T F -1 G + H F = α -1 K -1 z ,z " T X t=1 Ψ 2 (x -t ) # K -1 z ,z G = α -1 " T X t=1 μ T x t Ψ 1 (x -t ) # K -1 z ,z ! T H = T X t=1 - log(Z I α ) - 1 2 α -1 (μ T x t μ x t + tr(S x t )) - 1 2 tr(α -1 0 (x -t ) - K -1 z ,z Ψ 2 (x -t ))) The dynamics ELBO, also accounting for the optimal variational q (u): L dyn (θ ) log Z p(u)exp(C )du + H (q (x)) = - log Z (F -1 + K z ,z ) - 1 2 G T F -1 (F -1 + K z ,z ) -1 F -1 G + log Z (F -1 )+ 1 2 G T F -1 G + H + H (q (x)) Due to the factorising property of the PoE kernel, for multiple interacting parts total ELBO is the sum of ELBOs for each part. Toy dataset ARD RBF kernel, 2-nd order dynamics, 4 inducing inputs for dynamics map- ping (circles), 8 inducing inputs for X Y mapping (pluses) (a) (b) (c) (d) PCA-initialised latents (a), optimized latent and inducing points (b). Mean- prediction was used to generate tra- jectories in latent (c) and observed (d) spaces overlaid with the noisy training data. Explained variance on the train- ing data is 0.93, and 0.91 on testing data not used for training. Summary Presented the variational learning for coupled dynamics in latent space, which are a type of modu- lar MPs Examples with synthetic data Future work Check the advantage of the varia- tional learning vs. MAP Train on real MoCap datasets Sensory-motor integration as inter- action in latent space dynamics Deep models Acknowledgements We acknowledge funding from DFG un- der IRTG 1901 ’The Brain in Action’ and the European Union Seventh Frame- work Program (FP7/2007 - 2013) under grant agreement no 611909 (KoroiBot) References [Bizzi et al., 2008] Bizzi, E., Cheung, V., d’Avella, A., Saltiel, P., and Tresch, M. (2008). Combining modules for movement. Brain Research Reviews, 57(1):125 – 133. Networks in Motion. [Frigola et al., 2014] Frigola, R., Chen, Y., and Ras- mussen, C. (2014). Variational gaussian process state- space models. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., and Weinberger, K., editors, Ad- vances in Neural Information Processing Systems 27, pages 3680–3688. Curran Associates, Inc. [Schaal, 2006] Schaal, S. (2006). Dynamic movement primitives -a framework for motor control in humans and humanoid robotics. In Kimura, H., Tsuchiya, K., Ishiguro, A., and Witte, H., editors, Adaptive Motion of Animals and Machines, pages 261–280. Springer Tokyo. [Titsias, 2009] Titsias, M. K. (2009). Variational learning of inducing variables in sparse gaussian processes. In Dyk, D. A. V. and Welling, M., editors, AISTATS, vol- ume 5 of JMLR Proceedings, pages 567–574. JMLR.org. [Titsias and Lawrence, 2010] Titsias, M. K. and Lawrence, N. D. (2010). Bayesian gaussian pro- cess latent variable model. In Proceedings of the Thirteenth International Conference on Artificial Intelli- gence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, May 13-15, 2010, pages 844–851. [Velychko et al., 2014] Velychko, D., Endres, D., Taubert, N., and Giese, M. A. (2014). Coupling Gaussian pro- cess dynamical models with product-of-experts ker- nels. In Proceedings of the 24th International Conference on Artificial Neural Networks, LNCS 8681, pages 603– 610. Springer.

Upload: ledang

Post on 10-Apr-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: j (2ˇ˙r j p xijxi ;˙r) = dim( )) (r ;r ˙r m;r exp xi r m r xi m;r 2 1approximateinference.org/2015/accepted/VelychkoEndres...velychko@staff.uni-marburg.de, dominik.endres@uni-marburg.de

The Variational Coupled Gaussian Process Dynamical ModelDmytro Velychko, Dominik Endres

Theoretical Neuroscience Group, Department of PsychologyPhilipps-University, 35032 Marburg, Germany.

[email protected], [email protected]

IntroductionPlanning and execution of full-bodymovements is a hard control prob-lem. Modular movement primitives(MP) have been suggested as a meansto simplify it while retaining a suffi-cient degree of control flexibility for awide range of task [Bizzi et al., 2008]. A par-ticularly well-developed type of MP inrobotics is the dynamical movementprimitive (DMP) [Schaal, 2006]. But the formof the differential equation of a DMPremains fixed during learning, poten-tially reducing the representational ca-pacity. Coupled Gaussian Process Dy-namical Model (CGPDM) [Velychko et al., 2014]

combines the advantages of modular-ity and flexibility in the dynamics,at least theoretically. Non-parametricGPs learning have high computationalcost. We improve this by employ-ing sparse variational approximations[Titsias, 2009, Titsias and Lawrence, 2010, Frigola et al., 2014] and ob-viating the need for sampling.

Coupled GPDMs modelFull dynamical system comprises Mparts where every part is a modular MP(here 2 for simplicity). We introduceM × M dynamics in the latent space,each of which makes prediction abouta part of the latent space from theprevious state of some (other) part,these predictions are combined via PoE:

ε

σ2,2

σ1,2

σ2,1

σ1,1

GP (k2,1(X2, X2′)) GP (k1,1(X1, X1′))

GP (k1,2(X1, X1′)) GP (k2,2(X2, X2′))

f2,1(X) f1,1(X)

X11 X1,1

2

X2,12

X12 X1,1

N

X2,1N

X1N

f1,2(X) f2,2(X)

X21

X1,22

X2,22 X2

2

X1,2N

X2,2N X2

N

Product-of-Experts for Gaussian ex-perts:

xri

Latent state

xm,ri

Partialσm,r

m = 1 . . .M

σ2r =

(∑m

σ−2m,r

)−1

(1)

p(xri |x:,ri , σ:,r) ∝

∏mN

(xri |x

m,ri , σ2

m,r

)⇒ p(xri |x

:,ri , σr) =

exp

[− 1

2σ2r

(xri−σ

2r

∑m

xm,riσ2m,r

)2](2πσ2

r)dim(Xr)

2

(2)

Integrate out the intermediate predic-tions and get a simplified model:

ε

GP (k1((X1, X2), (X1′, X2′)))

GP (k2((X1, X2), (X1′, X2′)))

f1,1(X)

X11 X1

2 X1N

f2,2(X)

X21 X2

NX22

Covariance matrix of latent points x:

Σ + PKPT = σ2r1(N−1)×(N−1) + σ4

r

∑Mm=1

Km

σ4m,r

(3)

Kernel function kr generating these co-variance matrices:

kr (X :, X :′) = σ2rδ(X

:, X :′) + σ4r

∑Mm=1

km,r(Xm,Xm′)σ4m,r

(4)

σm,r can be increased to control the cou-pling. Coupling strength:

σ2rel(i,j) =

σ4j,j

σ4i,j

(5)

close to 0 indicate no coupling, close to1 – strong coupling.

Augmented model

X1t−1X1

t−2 X1t

X2t−1X2

t−2 X2t

Y 1t−1Y 1

t−2 Y 1t

Y 2t−2Y 2

t−1 Y 2t

f1(X1t−2, X

1t−1, X

2t−2, X

2t−1)

GP(µ(.), k(.., ..)) (z10 , . . . ,z

1l1

)

U1

f2(X1t−2, X

1t−1, X

2t−2, X

2t−1)

GP(µ(.), k(.., ..)) (z20 , . . . ,z

2l2

)

U2

GP(µ(.), k(., .)) (r10, . . . , r

1j1

)

V 1

g1(X1)

GP(µ(.), k(., .)) (r20, . . . , r

2j2

)

V 2

g2(X2)

Y - observed dataX - latent pointsri → vi and zj → uj - augmentingmappings

For each part:

X = {x0 . . .xT };xt ∈ RQ

Y = {y0 . . .yT };yt ∈ RD

f(x−t) ∼ GP(0, kf (x−t,x′−t))

ft = f(x−t)

xt ∼ N (ft, Iα2)

gd(X) ∼ GP(0, kg(xt,x′t))

gd = gd(X)

Y:,d ∼ N (gd, Iβ2)

p(x0) = N (x0|0, I)

Variational learningProposal variational distributions: q(xt) = N (µxt ,Sxt). q(u) and q(v) are uncon-strained distributions. The full variational joint proposal posterior distribution is:

q(x,u,f ,v, g) = p(y|g)p(g|x,v)q(v)

[T∏t=1

p(ft|f1:t−1,x0:t−1,u)

]q(x)q(u)

The ELBO:

log p(y|θ) ≥ L(θ) =

∫x,u,v,g,f

q(x,u,f ,v, g) logp(x,u,f ,v, g,y)

q(x,u,f ,v, g)(6)

Separating the latent dynamics part and applying the sufficient statistics assump-tion q(x,f ,u) = q(u)q(x)

∏Tt=1 p(ft|xt−1,u) [Frigola et al., 2014], we get even much lower

bound:

L(θ) ≥D∑d=1

∫x,v,gd

p(gd|x,v)q(x)q(v) logp(yd|gd)q(v)

+ logp(u)

q(u)du+H(q(x))

+

∫q(u)

[T∑t=1

∫q(xt)q(x−t)

(∫p(ft|x−t,u) log p(xt|ft)dft

)dxtdx−t

]

where H(q(x)) is the entropy of q(x). The first integral is given in [Titsias and Lawrence, 2010].The remaining integral is derived below. The innermost integral:

A =

∫p(ft|x−t,u) log p(xt|ft)dft

= logN (xt|Kx−t,zK−1z,zu, Iα)− 1

2tr(α−1Kx−t,x−t − α−1Kx−t,zK

−1z,zKx−t,z)

The integrals over dxtdx−t:

B =

∫q(xt)q(x−t)Adxtdx−t = − log(ZIα)− 1

2α−1uTK−1

z,zΨ2(x−t)K−1z,zu

+ α−1µTxtΨ1(x−t)K−1z,zu−

1

2α−1µTxtµxt −

1

2tr(α−1Ψ0(x−t)− α−1K−1

z,zΨ2(x−t))

where ZΣ =√|2πΣ| is a normalization coefficient for multivariate Gaussian with

Σ covariance. Then the sum over T can be written as a quadratic form:

C =T∑t=1

B = −1

2uTFu+ uTG +H = −1

2(u−F−1G)TF(u−F−1G) +

1

2GTF−1G +H

F = α−1K−1z,z

[T∑t=1

Ψ2(x−t)

]K−1

z,z G =

(α−1

[T∑t=1

µTxtΨ1(x−t)

]K−1

z,z

)T

H =T∑t=1

[− log(ZIα)− 1

2α−1(µTxtµxt + tr(Sxt))−

1

2tr(α−1(Ψ0(x−t)−K−1

z,zΨ2(x−t)))

]The dynamics ELBO, also accounting for the optimal variational q(u):

Ldyn(θ) ≥ log

∫p(u)exp(C)du+H(q(x)) = − logZ(F−1 +Kz,z)

− 1

2GTF−1(F−1 +Kz,z)−1F−1G + logZ(F−1) +

1

2GTF−1G +H+H(q(x))

Due to the factorising property of the PoE kernel, for multiple interacting parts totalELBO is the sum of ELBOs for each part.

Toy datasetARD RBF kernel, 2-nd order dynamics,4 inducing inputs for dynamics map-ping (circles), 8 inducing inputs forX → Y mapping (pluses)

2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.52.0

1.5

1.0

0.5

0.0

0.5

1.0

1.5

1.5 1.0 0.5 0.0 0.5 1.0 1.51.0

0.5

0.0

0.5

1.0

1.5

(a) (b)

1.5 1.0 0.5 0.0 0.5 1.0 1.51.0

0.5

0.0

0.5

1.0

1.5

0 20 40 60 80 100 120 140 1600.6

0.4

0.2

0.0

0.2

0.4

0.6

0.8

1.0

(c) (d)PCA-initialised latents (a), optimizedlatent and inducing points (b). Mean-prediction was used to generate tra-jectories in latent (c) and observed (d)spaces overlaid with the noisy trainingdata. Explained variance on the train-ing data is ≈ 0.93, and ≈ 0.91 on testingdata not used for training.

Summary• Presented the variational learning

for coupled dynamics in latentspace, which are a type of modu-lar MPs

• Examples with synthetic data

Future work• Check the advantage of the varia-

tional learning vs. MAP• Train on real MoCap datasets• Sensory-motor integration as inter-

action in latent space dynamics• Deep models

AcknowledgementsWe acknowledge funding from DFG un-der IRTG 1901 ’The Brain in Action’ andthe European Union Seventh Frame-work Program (FP7/2007 - 2013) undergrant agreement no 611909 (KoroiBot)

References[Bizzi et al., 2008] Bizzi, E., Cheung, V., d’Avella, A.,

Saltiel, P., and Tresch, M. (2008). Combining modulesfor movement. Brain Research Reviews, 57(1):125 – 133.Networks in Motion.

[Frigola et al., 2014] Frigola, R., Chen, Y., and Ras-mussen, C. (2014). Variational gaussian process state-space models. In Ghahramani, Z., Welling, M., Cortes,C., Lawrence, N., and Weinberger, K., editors, Ad-vances in Neural Information Processing Systems 27,pages 3680–3688. Curran Associates, Inc.

[Schaal, 2006] Schaal, S. (2006). Dynamic movementprimitives -a framework for motor control in humansand humanoid robotics. In Kimura, H., Tsuchiya, K.,Ishiguro, A., and Witte, H., editors, Adaptive Motion ofAnimals and Machines, pages 261–280. Springer Tokyo.

[Titsias, 2009] Titsias, M. K. (2009). Variational learningof inducing variables in sparse gaussian processes. InDyk, D. A. V. and Welling, M., editors, AISTATS, vol-ume 5 of JMLR Proceedings, pages 567–574. JMLR.org.

[Titsias and Lawrence, 2010] Titsias, M. K. andLawrence, N. D. (2010). Bayesian gaussian pro-cess latent variable model. In Proceedings of theThirteenth International Conference on Artificial Intelli-gence and Statistics, AISTATS 2010, Chia Laguna Resort,Sardinia, Italy, May 13-15, 2010, pages 844–851.

[Velychko et al., 2014] Velychko, D., Endres, D., Taubert,N., and Giese, M. A. (2014). Coupling Gaussian pro-cess dynamical models with product-of-experts ker-nels. In Proceedings of the 24th International Conferenceon Artificial Neural Networks, LNCS 8681, pages 603–610. Springer.