scene understanding without labels - university of … · s. m. ali eslami, nicolas heess,...
TRANSCRIPT
Learning paradigms
Decoder
x
z
xx
z
x
SupervisedLearning
Reinforcement Learning
GenerativeModelling
y
z
a yhorse left
Learning paradigms
Decoder
x
z
xx
z
x
SupervisedLearning
Reinforcement Learning
GenerativeModelling
y
z
a y
(2.3, -1, 0.5, 3)
not blinkinghorse left
Highly structured
General Purpose Graphics Programming
Vikash Mansinghka, Tejas D. Kulkarni, Yura N. Perov, and Joshua B. Tenenbaum (2013)
Recurrent Neural Networksfor Image Generation
Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra (2015)
Variational Inference
Approximate p(z|x) using q(z|x)
Parameterise q(z|x) by deep network
Minimise KL[ q(z|x) | p(z|x) ] via SGD
Model
x
z
x
Recurrent Neural Networks for Image Generation
c
x
z
p(x|c)D
ecod
ing
Gen
erat
ion
Enc
odin
gIn
fere
nce
Hinton (2006)
Read Read
ct+2
x
Write
z
Recurrent Neural Networks for Image Generation
c
x
z
p(x|c)D
ecod
ing
Gen
erat
ion
Enc
odin
gIn
fere
nce
Write
Read
ct
x
ct+1
x
Write
p(x|cT)
z z
Gregor et al. (2015)Hinton (2006)
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, Koray Kavukcuoglu, Geoffrey Hinton (2016)
x
z
blue brick pile of bricks
x
z
Mod
elIm
age
Cau
se
not sufficient forgraspingcountingtransfergeneralisation
x
z
Mod
elIm
age
Cau
se
x
zwhat
y1
z1 zwherez1 zwhat
y2
z2 zwherez2
atty1
atty2
blue brick red brickpile of bricks blue brickabove
red brickbelow
x
z1 z2
Decoder
x y
h1 h2 h3
z1 z2 z3 Decoder
x y
h1 h2 h3
zpresz1 zpresz2 zpresz3zwhatz1 zwhatz2 zwhatz3zwherez1 zwherez2 zwherez3
x
zwhat
y1
z1 zwherez1 zwhat
y2
z2 zwherez2
atty1
atty2
Mod
elIn
fere
nce
Net
wor
k
x
z1 z2 z3
Decoder
x y
h1 h2 h3
zpresz1 zpresz2 zpresz3zwhatz1 zwhatz2 zwhatz3zwherez1 zwherez2 zwherez3
x
zwhat
y1
z1 zwherez1 zwhat
y2
z2 zwherez2
atty1
atty21. Build in structure
Get out meaning
2. Inference networks that area. recurrentb. variable-lengthc. attentive
3. End-to-end learning througha. discrete, continuous varsb. inference and model nets
Recap: Key ideas
x
z
Additional structure
x
z
distributed vector that correlates with blue brick
class=brickcolour=blueposition=Protation=R
learned
specified
Unsupervised Learning of 3D Structure from Images
Danilo Rezende, S. M. Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, Nicolas Heess (2016)
How to recover 3D structure from 2D images?To form stable representations, regardless of camera position
Motivation
How to recover 3D structure from 2D images?To form stable representations, regardless of camera position
● Inherently ill-posed○ All objects appear under self occlusion, infinite explanations○ Therefore build statistical models to know what’s likely and what’s not
● Even with models, inference is intractable○ Important to capture multi-modal explanations
● How are 3D scenes best represented?○ Meshes or voxels?
● Where is training data collected from?
Motivation
Unsupervised Learning of 3D Structure from Images
Unconditional inference
Unsupervised Learning of 3D Structure from Images
Inferring object meshes
● Deep Supervised Learning
● Deep Reinforcement Learning
● Model-based Methods
● Deep Variational Inference
● Structured / Unstructured Generative Models
Recap