neural_programmer_interpreter
TRANSCRIPT
![Page 1: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/1.jpg)
Neural Programmer-Interpreters
ICLR 2016 Best Paper Award
Scott Reed & Nando de Freitas Google DeepMind
citation: 19 London, UK
Katy, 2016/10/14
![Page 2: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/2.jpg)
Motivation
• ML is ultimately about automating tasks, hoping that machine can do everything for human
• For example, I want the machine to make a cup of coffee for me
![Page 3: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/3.jpg)
Motivation
• Ancient way: is to write full highly-detailed program specifications to carry them out
• AI way: come up with a lot of training examples that capture the variability in the real world, and then train some general learning machine on this large data set.
![Page 4: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/4.jpg)
Motivation
• but sometimes the dataset is not big enough! and it doesn’t generalize well..
• NPI is an attempt to use neural methods to train machines to carry out simple tasks based on a small amount of training data.
![Page 5: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/5.jpg)
NPI Goals• 1. Long-term prediction: Model potentially long sequences
of actions by exploiting compositional structure.
• 2. Continual learning: Learn new programs by composing previously- learned programs, rather than from scratch.
• 3. Data efficiency: Learn generalizable programs from a small number of example traces.
• 4. Interpretability: By looking at NPI’s generated commands, we can understand what it is doing at multiple levels of temporal abstraction.
![Page 6: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/6.jpg)
Related Work
• Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
• Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).
![Page 7: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/7.jpg)
Sequence to sequence learning with neural networks
![Page 8: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/8.jpg)
Neural turing machines
http://cpmarkchang.logdown.com/posts/279710-neural-network-neural-turing-machine
![Page 9: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/9.jpg)
Outline
• NPI core module: how it works
• Demos
• Experiment
• Conclusion
![Page 10: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/10.jpg)
Outline
• NPI core module: how it works
• Demos
• Experiment
• Conclusion
![Page 11: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/11.jpg)
NPI core module• The NPI core is a LSTM network that acts as a
router between programs conditioned on the current state observation and previous hidden unit states
• input: a learnable program embedding, program arguments passed on by the calling program, and a feature representation of the environment.
• output: a key indicating what program to call next, arguments for the following program and a flag indicating whether the program should terminate.
core: an LSTM-based sequence model
![Page 12: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/12.jpg)
Adding Numbers Together
![Page 13: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/13.jpg)
Bubble Sort
![Page 14: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/14.jpg)
Car Rendering• Whatever the starting position, the program should generate a
trajectory of actions that delivers the camera to the target view, e.g. frontal pose at a 15◦ elevation.
![Page 15: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/15.jpg)
NPI Model
![Page 16: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/16.jpg)
How it Works
![Page 17: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/17.jpg)
How it Workse: environment
a: program argument p: embedded program vector
r(t): probability to terminate the current program
![Page 18: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/18.jpg)
How it Works
![Page 19: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/19.jpg)
How it Works
![Page 20: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/20.jpg)
Outline
• NPI core module: how it works
• Demos
• Experiment
• Conclusion
![Page 21: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/21.jpg)
Adding Numbers
• Environment:
• Scratch pad with the two numbers to be added, a carry row and output row.
• 4 read/write pointers location
• Program:
• LEFT, RIGHT programs that can move a carry pointer left or right, respectively.
• WRITE program that writes a specified value to the location of a specified pointer
![Page 22: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/22.jpg)
Adding Numbers
Actual trace of addition program generated by our model on the
problem shown to the left.
![Page 23: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/23.jpg)
Adding Numbers
• all output actions (primitive atomic actions that can be performed on the environment) are performed with a single instruction – ACT.
all output actions (primitive atomic actions that can be performed on the environment) are performed with a single
instruction – ACT.
![Page 24: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/24.jpg)
Adding Numbers Together
![Page 25: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/25.jpg)
Bubble Sort
• environment:
• Scratch pad with the array to be sorted.
• Read/Write pointers
![Page 26: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/26.jpg)
Bubble Sort
![Page 27: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/27.jpg)
Bubble Sort
![Page 28: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/28.jpg)
Car Rendering• Environment:
• Rendering of the car (pixels). (use CNN as feature encoder)
• The current car pose is NOT provided
• Target angle and elevation coordinates.
![Page 29: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/29.jpg)
Car Rendering
![Page 30: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/30.jpg)
Car Rendering• Whatever the starting position, the program should generate a
trajectory of actions that delivers the camera to the target view, e.g. frontal pose at a 15◦ elevation.
![Page 31: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/31.jpg)
GOTO
![Page 32: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/32.jpg)
HGOTO
• horizontal goto
![Page 33: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/33.jpg)
LGOTO
• Left goto
![Page 34: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/34.jpg)
ACT
• rotate 15 degree
![Page 35: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/35.jpg)
give control back to LGOTO
![Page 36: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/36.jpg)
core realized it haven’t done with horizontal rotation
![Page 37: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/37.jpg)
Control back to GOTO
![Page 38: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/38.jpg)
![Page 39: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/39.jpg)
Outline
• NPI core module: how it works
• Demos
• Experiment
• Conclusion
![Page 40: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/40.jpg)
Experiments
• Data Efficiency
• Generalization
• Learning new programs with a fixed NPI core
![Page 41: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/41.jpg)
Data Efficiency - Sorting• Seq2Seq LSTM and NPI
used the same number of layersand hidden units.
• Trained on length 20 arrays of single-digit numbers.
• NPI benefits from mining multiple subprogram examples per sorting instance
accuracy v.s. training example
![Page 42: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/42.jpg)
Generalization - Sorting• For each length 2 up
to 20, we provided 64 example bubble sort traces, for a total of 1216 examples.
• Then, we evaluated whether the network can learn to sort arrays beyond length 20
![Page 43: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/43.jpg)
Generalization - Adding
only train on sequence length up to 20
![Page 44: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/44.jpg)
Learning New Programs with a Fixed NPI Core
• example task: find the Max in array
• RJMP: move all pointers to the right by repeatedly calling RSHIFT program
• MAX: call BUBBLESORT and then RJMP
• Expand program memory by adding 2 slots. Randomly initialize, then learn by backpropagation with the NPI core and all other parameters fixed.
![Page 45: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/45.jpg)
• 1. Randomly initialize new program vectors in memory
• 2. Freeze core and other program vectors
• 3. Backpropagate gradients to new program vectors
![Page 46: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/46.jpg)
• + Max: performance after addition of MAX program to memory.
• “unseen” uses a test set with disjoint car models from the training set.
![Page 47: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/47.jpg)
Outline
• NPI core module: how it works
• Demos
• Experiment
• Conclusion
![Page 48: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/48.jpg)
Conclusion(1/2)
• NPI is a RNN/LSTM-based sequence-to-sequence translator with the ability to keep track of calling programs while recurse into sub-program
• NPI generalizes well in comparison to sequence-to-sequence LSTMs.
• A trained NPI with a fix core can learn new task while not forgetting about the old task
![Page 49: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/49.jpg)
Conclusion(2/2)
• provide far fewer examples, but where the labels contains richer information allowing the model to learn compositional structure(It’s like sending kids to school)
![Page 50: Neural_Programmer_Interpreter](https://reader034.vdocuments.us/reader034/viewer/2022042907/58cf1ed61a28abc05f8b5b05/html5/thumbnails/50.jpg)
Further Discussion
• Can each task help each other during training?
• Can we share environment encoder?
• Any comments?
project page: http://www-personal.umich.edu/~reedscot/iclr_project.html