presented by tambet matiisen write programs · 2018-02-20 · separately, we concatenate the...

34
DeepCoder: Learning to Write Programs Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, Daniel Tarlow Presented by Tambet Matiisen ASE seminar, 20.02.2018

Upload: others

Post on 16-Feb-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

DeepCoder: Learning to Write Programs

Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, Daniel Tarlow

Presented by Tambet MatiisenASE seminar, 20.02.2018

Page 2: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

The Task

Inductive Program Synthesis (IPS) - given a set of input-output pairs, generate (the shortest) program that converts input into output.

Page 3: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

The Idea

● Use neural network to predict from input-output pairs which functions were used in the program.

● Use those predictions to prioritize exhaustive search.

Page 4: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

The Method

1. Define Domain Specific Language (DSL)2. Generate programs and input-output examples3. Train neural network to predict functions from input-output pairs4. Perform program search using classical methods

Page 5: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

The Method

1. Define Domain Specific Language (DSL)2. Generate programs and input-output examples3. Train neural network to predict functions from input-output pairs4. Perform program search using classical methods

Page 6: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

Domain Specific Language (DSL)

Page 7: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

The Method

1. Define Domain Specific Language (DSL)2. Generate programs and input-output examples3. Train neural network to predict functions from input-output pairs4. Perform program search using classical methods

Page 8: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

Generate programs and input-output examples● Enumerate programs in DSL

○ Prune those for which shorter equivalent program exists (i.e. unused variable).

● Generate inputs and outputs

○ Constrain outputs to predetermined range ([-256, 255]), propagate constraints back to input to obtain ranges for inputs. If range is empty, discard program.

○ Pick inputs from pre-computed valid ranges and execute the program to obtain output values.

● Generate attributes

○ Use binary vector to represent which functions were used in the program.

Page 9: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

The Method

1. Define Domain Specific Language (DSL)2. Generate programs and input-output examples3. Train neural network to predict functions from input-output pairs4. Perform program search using classical methods

Page 10: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

The Network Architecture“For the encoder we use a simple feed-forward architecture. First, we represent the input and output types (singleton or array) by a one-hot-encoding, and we pad the inputs and outputs to a maximum length L with a special NULL value. Second, each integer in the inputs and in the output is mapped to a learned embedding vector of size E = 20. (The range of integers is restricted to a finite range and each embedding is parametrized individually.) Third, for each input-output example separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single (fixed-length) vector, and pass this vector through H = 3 hidden layers containing K = 256 sigmoid units each. The third hidden layer thus provides an encoding of each individual input-output example. Finally, for input-output examples in a set generated from the same program, we pool these representations together by simple arithmetic averaging.”

Page 11: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

Network Inputs“we represent the input and output types (singleton or array) by a one-hot-encoding”

integer

array

1 0

0 1

Page 12: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

“each integer in the inputs and in the output is mapped to a learned embedding vector of size E = 20”

NB! Equivalent to multiplying one-hot vector with weight matrix, just faster.

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3 -4. -.1 4.1 5.5 0.9 -.4 -.3 1.4 0.7 -1-256

-255

-254

...

253

254

255

1.2 -.8 -.11

3.2 -2. 1.3 -4. -.1 4.1 5.5 0.9 -.4 -.3 1.4 0.7 -1

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3 -4. -.1 4.1 5.5 0.9 -.4

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3 -4. -.14.1 5.5 0.9 -.4 -.3 1.4 0.7 -1

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2-2. 1.3 -4. -.1 4.1 5.5 0.9 -.4 -.3 1.4 0.7 -1

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3-4. -.1 4.1 5.5 0.9 -.4 -.3 1.4 0.7 -1

0.1 2.4 -.2 1.3

-.3 1.4 0.7 -1

...

learned

Page 13: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

“we pad the inputs and outputs to a maximum length L with a special NULL value”

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3 -4. -.1 4.1 5.5 0.9 -.4 -.3 1.4 0.7 -1-17

-3

4

NULL

NULL

NULL

1.2 -.8 -.11

3.2 -2. 1.3 -4. -.1 4.1 5.5 0.9 -.4 -.3 1.4 0.7 -1

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3 -4. -.1 4.1 5.5 0.9 -.4

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3 -4. -.14.1 5.5 0.9 -.4 -.3 1.4 0.7 -1

0.1 2.4 -.2 1.3

-.3 1.4 0.7 -1

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3 -4. -.14.1 5.5 0.9 -.4 -.3 1.4 0.7 -1

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3 -4. -.14.1 5.5 0.9 -.4 -.3 1.4 0.7 -1

Array [-17, -3, 4], padded to L = 6

Page 14: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

“for each input-output example separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single (fixed-length) vector”

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3 -4. -.1 4.1 5.5 0.9 -.4 -.3 1.4 0.7 -1-17

-3

4

NULL

NULL

NULL

1.2 -.8 -.11

3.2 -2. 1.3 -4. -.1 4.1 5.5 0.9 -.4 -.3 1.4 0.7 -1

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3 -4. -.1 4.1 5.5 0.9 -.4

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3 -4. -.14.1 5.5 0.9 -.4 -.3 1.4 0.7 -1

0.1 2.4 -.2 1.3

-.3 1.4 0.7 -1

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3 -4. -.14.1 5.5 0.9 -.4 -.3 1.4 0.7 -1

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3 -4. -.14.1 5.5 0.9 -.4 -.3 1.4 0.7 -1

array 0 1

integer 1 0

0.1 2.4 -.2 1.3 1.2 -.8 -.11

3.2 -2. 1.3-4. -.1 4.1 5.5 0.9 -.4 -.3 1.4 0.7 -17

Page 15: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

“... and pass this vector through H = 3 hidden layers containing K = 256 sigmoid units each”

Input layer

Hidden layer 1

Σ

Σ

Σ

Hidden layer 2

Σ

Σ

Σ

Hidden layer 3

Σ

Σ

Σ

x0

x1

x2

x3

1 1 1

wi,j(1) wi,j

(2) wi,j(3)

h1(1)

h2(1)

h3(1)

h1(2)

h2(2)

h3(2)

h1(3)

h2(3)

h3(3)

Bias inputs

bj(1) bj

(2) bj(3)

encoding or representation

(vector of length 256)

Page 16: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

“... and pass this vector through H = 3 hidden layers containing K = 256 sigmoid units each”

Matrix notation:

h(k) - 256x1 matrixW(k) - 256x256 matrix (learned)b(k) - 256x1 matrix (learned)

Page 17: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

“Finally, for input-output examples in a set generated from the same program, we pool these representations together by simple arithmetic averaging.”

0.3 1.8 -.8 1.3sample 1

sample 2

sample 3

avgpooled

1.1 -.8 -.1 2.1

0.4 0.8 -.2 1.1

-.2 1.4 0.3 -.1

...

...

...

...

256

Page 18: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

“We use a decoder that pre-multiplies the encoding of input-output examples by a learned CxK matrix, where C = 34 is the number of functions in our DSL...”

Pooled values

Outputlayer

Σ

Σ

Σ

z0

z1

z2

z3

1

wi,j(4)

p1

p2

p3

Bias inputs

bj(4)

probabilities of functions(vector of length 34)

Page 19: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

“... and treats the resulting C numbers as log-unnormalized probabilities (logits) of each function appearing in the source code.”

Page 20: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

The Final Architecture

Page 21: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

“We use the negative cross entropy loss to train the neural network”

Equivalently could just maximize cross entropy, just minimization is usually implemented.

Page 22: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

Training using gradient descentTake derivative of loss function with respect to weights to see how loss changes when you change the weight. Change the weight so that the loss would decrease.

In matrix notation:

Here α is a learning rate that must be manually tuned.

Page 23: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

Demo

Page 24: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

The Method

1. Define Domain Specific Language (DSL)2. Generate programs and input-output examples3. Train neural network to predict functions from input-output pairs4. Perform program search using classical methods

Page 25: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

Program search● Depth-first search (DFS) - consider the functions ordered by their predicted

probabilities from the neural network.● “Sort and add” enumeration - maintains a set of active functions and

performs DFS with the active function set only. Whenever the search fails, the next most probable function (or several) are added to the active set and the search restarts with this larger active set.

● Sketch - SMT-based program synthesis tool. Sketch can utilize the neural network predictions in a Sort and add scheme as described above, as the possibilities for each function hole can be restricted to the current active set.

● λ2 - can be used in our framework using a Sort and add scheme as described above by choosing the library of functions according to the neural network predictions.

Page 26: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

The Experiments

Page 27: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

Programs of length T=3

● Test set was guaranteed to be semantically disjoint from all programs on which the neural network was trained (we have ensured that all test programs behave differently from all programs used during training on at least one input).

● As a baseline, we also ran all search procedures using a simple prior as function probabilities, computed from their global incidence in the program corpus.

Page 28: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

Programs of length T=5

Page 29: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

Example predictions

Page 30: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

Em

bedd

ings

Page 31: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

Con

fusi

on M

atrix

(T=3

)

Page 32: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

Con

fusi

on M

atrix

(T=5

)

Page 33: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

Alternative Model● Encoder: GRU-based RNN● Decoder: RNN trained to predict the entire program token-by-token● Beam search was used to explore likely programs predicted by RNN

It only lead to a solution comparable with the other techniques when searching for programs of lengths T <= 2, where the search space size is very small (on the order of 10^3).

We do not rule out that a more sophisticated RNN decoder or training procedure could be possibly more successful.

Page 34: Presented by Tambet Matiisen Write Programs · 2018-02-20 · separately, we concatenate the embeddings of the input types, the inputs, the output type, and the output into a single

Thank [email protected]