running probabilistic programs backwardsntoronto/papers/toronto-2015esop-slides.p… · •...

Running ProbabilisticRunning ProbabilisticRunning Probabilistic

Programs BackwardsPrograms BackwardsPrograms Backwards

Neil Toronto * Jay McCarthy † David Van Horn *

* University of Maryland † Vassar College

ESOP 2015

2015/04/14

RoadmapRoadmapRoadmap

• Probabilistic inference, and why it’s hard

1111111111111111



• Limitations of current probabilistic programming languages(PPLs)

1111111111111111




• Contributions

1111111111111111




• Contributions

Uncomputable, compositional ways to not limit language

1111111111111111




• Contributions

Uncomputable, compositional ways to not limit language

Computable, compositional ways to not limit language

1111111111111111

Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips

(let ([x (flip 0.5)]) x)

2222222222222222


(let ([x (flip 0.5)]) x)

0.5

2222222222222222


(let ([x (flip 0.5)]) x)

0.5

0.5

2222222222222222


(let ([x (flip 0.5)][y (flip 0.5)])

(cons x y))

0.5

0.5 ⟨ , ⟩0.5 ⟨ , ⟩

2222222222222222


(let ([x (flip 0.5)][y (flip 0.5)])

(cons x y))

0.5 0.5

0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩

2222222222222222


(let* ([x (flip 0.5)][y (flip (if (equal? x heads) 0.5 0.3))])

(cons x y))

0.5 0.5

0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩

2222222222222222


(let* ([x (flip 0.5)][y (flip (if (equal? x heads) 0.5 0.3))])

(cons x y))

0.5 0.5

0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩

0.3 0.72222222222222222


0.5 0.5

0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩

0.3 0.72222222222222222


0.5

0.5 ⟨ , ⟩0.5 ⟨ , ⟩

0.32222222222222222

Stochastic Ray TracingStochastic Ray TracingStochastic Ray Tracing

sto·cha·stic /stō-ˈkas-tik/ adj. fancy word for "randomized"

2222222222222222


ap·er·ture /ˈap-ə(r)-chər/ n. fancy word for "opening"

2222222222222222


Simulate projecting rays onto a sensor...

2222222222222222


... and collect them to form an image

2222222222222222

Programming Stochastic Ray TracingProgramming Stochastic Ray TracingProgramming Stochastic Ray Tracing

• Normally thousands of lines of code

2222222222222222



• Bears little resemblance to the physical process

2222222222222222




• In DrBayes, it’s simple physics simulation:

(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (dot v n))])

(if (> denom 0)(let ([t (/ (+ d (dot p0 n)) denom)]) (if (> t 0)

(collision t (vec+ p0 (vec* v t)) n)#f))

#f)))

2222222222222222








#f)))

• Other PPLs really aren’t up to this yet

2222222222222222








#f)))

• Other PPLs really aren’t up to this yet

• The issue is one of theory, not engineering effort

2222222222222222

Simpler ExampleSimpler ExampleSimpler Example

• Assume (random) returns a value uniformly in

3333333333333333



Density function for value of (random):

xxxxxxxxx

p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)p(x

)

000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000

.25.25.25.25.25.25.25.25.25

.5.5.5.5.5.5.5.5.5

.75.75.75.75.75.75.75.75.75

111111111

1.251.251.251.251.251.251.251.251.25

3333333333333333



Density function for value of (max 0.5 (random)):

xxxxxxxxx

pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)

000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000

.25.25.25.25.25.25.25.25.25

.5.5.5.5.5.5.5.5.5

.75.75.75.75.75.75.75.75.75

111111111

1.251.251.251.251.251.251.251.251.25

pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ???

3333333333333333



Density function for value of (max 0.5 (random)):

xxxxxxxxx

pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)pₘ(x

)

000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000

.25.25.25.25.25.25.25.25.25

.5.5.5.5.5.5.5.5.5

.75.75.75.75.75.75.75.75.75

111111111

1.251.251.251.251.251.251.251.251.25

pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist

3333333333333333

What Can’t Densities Model?What Can’t Densities Model?What Can’t Densities Model?

4444444444444444


• Results of discontinuous functions (bounded measuring devices)

(let ([temperature (normal 99 1)]) (min 100 temperature))

4444444444444444




• Variable-dimensional things (union types)

(if test? none (just x))

4444444444444444






• Infinite-dimensional things (recursion)

4444444444444444






• Infinite-dimensional things (recursion)

• In general: the distributions of program values

4444444444444444

Probability MeasuresProbability MeasuresProbability Measures

• Like already-integrated densities, but a primitive concept

5555555555555555



• Measure of (random) is , defined by

5555555555555555




• Measure of (max 0.5 (random)) defined by

5555555555555555





This term assigns probability

5555555555555555





This term assigns probability

• Need a way to derive measures from code

5555555555555555

Probability Measures Via PreimagesProbability Measures Via PreimagesProbability Measures Via Preimages

• Interpret (max 0.5 (random)) as , defined

6666666666666666



• Derive measure of (max 0.5 (random)) as

6666666666666666




where

6666666666666666




where

• Factored into random and deterministic parts:

6666666666666666




where

• Factored into random and deterministic parts:

• In other words, compute measures of expressions by runningthem backwards

6666666666666666

Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...

• Seems like we need:

7777777777777777



Standard interpretation of programs as pure functions from arandom source

7777777777777777




Efficient way to compute preimage sets

7777777777777777





Efficient representation of arbitrary sets

7777777777777777






Efficient way to compute areas of preimage sets

7777777777777777







Proof of correctness w.r.t. standard interpretation

7777777777777777







Proof of correctness w.r.t. standard interpretation

• WAT

7777777777777777

What About Approximating?What About Approximating?What About Approximating?

Conservative approximation with rectangles:

8888888888888888


Restricting preimages to rectangular subdomains:

9999999999999999


Sampling: exponential to quadratic (e.g. days to minutes)

10101010101010101010101010101010

Contribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea Feasible

• Standard interpretation of programs as pure functions from arandom source

• Efficient way to compute preimage sets

• Efficient representation of arbitrary sets

• Efficient way to compute volumes of preimage sets

• Proof of correctness w.r.t. standard interpretation

11111111111111111111111111111111



• Efficient way to compute abstract preimage subsets

• Efficient representation of arbitrary sets



11111111111111111111111111111111




• Efficient representation of abstract sets



11111111111111111111111111111111





• Efficient way to sample uniformly in preimage sets


11111111111111111111111111111111






Efficient domain partition sampling


11111111111111111111111111111111






Efficient domain partition sampling

Efficient way to determine whether a domain sample isactually in the preimage (just use standard interpretation)


11111111111111111111111111111111

How Many Meanings?How Many Meanings?How Many Meanings?

• Start with pure programs, then lift by threading a random store

11111111111111111111111111111111



• Nonrecursive, nonprobabilistic programs: , ,

11111111111111111111111111111111




• Add 3 semantic functions for recursion and probabilistic choice

11111111111111111111111111111111





• Full development needs 2 more to transfer theorems frommeasure theory...

11111111111111111111111111111111






• ... oh, and 1 more to collect information for Monte Carlointegration

11111111111111111111111111111111






• ... oh, and 1 more to collect information for Monte Carlointegration

Tally: 3+3+2+1 = 9 semantic functions, 11 or 12 rules each

11111111111111111111111111111111

Enter Category TheoryEnter Category TheoryEnter Category Theory

• Moggi (1989): Introduces monads for interpreting effects

11111111111111111111111111111111



• Other kinds of categories: idioms, arrows

11111111111111111111111111111111




• Arrow defined by type constructor and thesecombinators:

11111111111111111111111111111111




• Arrow defined by type constructor and thesecombinators:

• Arrows are always function-like

11111111111111111111111111111111

Reducing ComplexityReducing ComplexityReducing Complexity

Function arrow: is just

11111111111111111111111111111111

Pair PreimagesPair PreimagesPair Preimages

12121212121212121212121212121212


:

12121212121212121212121212121212


and :

12121212121212121212121212121212


:

12121212121212121212121212121212

Correctness Theorems For Low, Low PricesCorrectness Theorems For Low, Low PricesCorrectness Theorems For Low, Low Prices

• Define

12121212121212121212121212121212


• Define

• Derive and others so that distributes; e.g.

12121212121212121212121212121212


• Define


• Distributive properties makes proving this very easy:

Theorem (correctness). For all , .

12121212121212121212121212121212


• Define




In English: computes preimages under .

12121212121212121212121212121212


• Define





• Other correctness proofs are similarly easy: prove 5 distributiveproperties

12121212121212121212121212121212


• Define





• Other correctness proofs are similarly easy: prove 5 distributiveproperties

• Can add (random) and recursion to all semantics in one shot

12121212121212121212121212121212

AbstractionAbstractionAbstraction

Rectangle: An interval or union of intervals, a subset of , or for rectangles and

13131313131313131313131313131313



• Easy representation; easy intersection, join (whichoverapproximates union), empty test, etc.

13131313131313131313131313131313




• Define (and therefore ) by replacing sets and setoperations with rectangles and rectangle operations

13131313131313131313131313131313




• Define (and therefore ) by replacing sets and setoperations with rectangles and rectangle operations

• Recursion is somewhat tricky—requires fine control overrecursion depth or if choices

13131313131313131313131313131313

In Theory...In Theory...In Theory...

Theorem (sound). computes overapproximations of thepreimages computed by .

• Consequence: Sampling in abstract preimages doesn’t leaveanything out

14141414141414141414141414141414




Theorem (decreasing). never returns preimages larger thanthe given subdomain.

• Consequence: Refining abstract preimage sets never results in aworse approximation

14141414141414141414141414141414




Theorem (decreasing). never returns preimages larger thanthe given subdomain.

• Consequence: Refining abstract preimage sets never results in aworse approximation

Theorem (monotone). is monotone.

• Consequence: Partitioning and then refining never results in aworse approximation 14141414141414141414141414141414

In Practice...In Practice...In Practice...

Theorems prove this always works:

15151515151515151515151515151515

Program Domain ValuesProgram Domain ValuesProgram Domain Values

16161616161616161616161616161616


• Program inputs are infinite binary trees:

16161616161616161616161616161616



• Every expression in a program is assigned a node

16161616161616161616161616161616




• Implemented using lazy trees of random values

16161616161616161616161616161616




• Implemented using lazy trees of random values

• No probability density for domain, but there is a measure 16161616161616161616161616161616

Example: Stochastic Ray TracingExample: Stochastic Ray TracingExample: Stochastic Ray Tracing

17171717171717171717171717171717

Example: Probabilistic VerificationExample: Probabilistic VerificationExample: Probabilistic Verification

(struct/drbayes float-any ())(struct/drbayes float (value error))

18181818181818181818181818181818



(define/drbayes (flsqrt x) (if (float-any? x)

x(let ([v (float-value x)]

[e (float-error x)]) (cond [(negative? v) (float-any)]

[(zero? v) (float 0 0)][else (float (sqrt v)

(+ (- 1 (sqrt (- 1 e)))(* 1/2 epsilon)))]))))

18181818181818181818181818181818







(+ (- 1 (sqrt (- 1 e)))(* 1/2 epsilon)))]))))

• Idea: sample e where (> (float-error e) threshold)

18181818181818181818181818181818







(+ (- 1 (sqrt (- 1 e)))(* 1/2 epsilon)))]))))

• Idea: sample e where (> (float-error e) threshold)

• Verified flhypot, flsqrt1pm1, flsinh in Racket’s mathlibrary, as well as others

18181818181818181818181818181818

Examples: Other Inference TasksExamples: Other Inference TasksExamples: Other Inference Tasks

• Typical Bayesian inference

Hierarchical models

Bayesian regression

Model selection

19191919191919191919191919191919

Examples: Other Inference TasksExamples: Other Inference TasksExamples: Other Inference Tasks

• Typical Bayesian inference

Hierarchical models

Bayesian regression

Model selection

• Atypical

Programs that halt with probability < 1, or never halt

Probabilistic context-free grammars with context-sensitiveconstraints

19191919191919191919191919191919

SummarySummarySummary

• Probabilistic inference is hard, so PPLs have been popping up

20202020202020202020202020202020



• Interpreting every program requires measure theory

20202020202020202020202020202020




• Defined a semantics that computes preimages

20202020202020202020202020202020





• Measuring abstract preimages or sampling in them carries outinference

20202020202020202020202020202020





• Measuring abstract preimages or sampling in them carries outinference

• Can do a lot of cool stuff that’s normally inaccessible

20202020202020202020202020202020

https://github.com/ntoronto/drbayes

21212121212121212121212121212121

running probabilistic programs backwardsntoronto/papers/toronto-2015esop-slides.p… · •...

Documents