running probabilistic programs backwardsntoronto/papers/toronto-2015esop-slides.p… · •...
TRANSCRIPT
Running ProbabilisticRunning ProbabilisticRunning Probabilistic
Programs BackwardsPrograms BackwardsPrograms Backwards
Neil Toronto * Jay McCarthy † David Van Horn *
* University of Maryland † Vassar College
ESOP 2015
2015/04/14
RoadmapRoadmapRoadmap
• Probabilistic inference, and why it’s hard
1111111111111111
RoadmapRoadmapRoadmap
• Probabilistic inference, and why it’s hard
• Limitations of current probabilistic programming languages(PPLs)
1111111111111111
RoadmapRoadmapRoadmap
• Probabilistic inference, and why it’s hard
• Limitations of current probabilistic programming languages(PPLs)
• Contributions
1111111111111111
RoadmapRoadmapRoadmap
• Probabilistic inference, and why it’s hard
• Limitations of current probabilistic programming languages(PPLs)
• Contributions
Uncomputable, compositional ways to not limit language
1111111111111111
RoadmapRoadmapRoadmap
• Probabilistic inference, and why it’s hard
• Limitations of current probabilistic programming languages(PPLs)
• Contributions
Uncomputable, compositional ways to not limit language
Computable, compositional ways to not limit language
1111111111111111
Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips
(let ([x (flip 0.5)]) x)
2222222222222222
Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips
(let ([x (flip 0.5)]) x)
0.5
2222222222222222
Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips
(let ([x (flip 0.5)]) x)
0.5
0.5
2222222222222222
Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips
(let ([x (flip 0.5)][y (flip 0.5)])
(cons x y))
0.5
0.5 ⟨ , ⟩0.5 ⟨ , ⟩
2222222222222222
Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips
(let ([x (flip 0.5)][y (flip 0.5)])
(cons x y))
0.5 0.5
0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩
2222222222222222
Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips
(let ([x (flip 0.5)][y (flip 0.5)])
(cons x y))
0.5 0.5
0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩
2222222222222222
Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips
(let* ([x (flip 0.5)][y (flip (if (equal? x heads) 0.5 0.3))])
(cons x y))
0.5 0.5
0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩
2222222222222222
Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips
(let* ([x (flip 0.5)][y (flip (if (equal? x heads) 0.5 0.3))])
(cons x y))
0.5 0.5
0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩
0.3 0.72222222222222222
Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips
0.5 0.5
0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩
0.3 0.72222222222222222
Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips
0.5 0.5
0.5 ⟨ , ⟩ ⟨ , ⟩0.5 ⟨ , ⟩ ⟨ , ⟩
0.3 0.72222222222222222
Programming Coin FlipsProgramming Coin FlipsProgramming Coin Flips
0.5
0.5 ⟨ , ⟩0.5 ⟨ , ⟩
0.32222222222222222
Stochastic Ray TracingStochastic Ray TracingStochastic Ray Tracing
sto·cha·stic /stō-ˈkas-tik/ adj. fancy word for "randomized"
2222222222222222
Stochastic Ray TracingStochastic Ray TracingStochastic Ray Tracing
sto·cha·stic /stō-ˈkas-tik/ adj. fancy word for "randomized"
2222222222222222
Stochastic Ray TracingStochastic Ray TracingStochastic Ray Tracing
ap·er·ture /ˈap-ə(r)-chər/ n. fancy word for "opening"
2222222222222222
Stochastic Ray TracingStochastic Ray TracingStochastic Ray Tracing
ap·er·ture /ˈap-ə(r)-chər/ n. fancy word for "opening"
2222222222222222
Stochastic Ray TracingStochastic Ray TracingStochastic Ray Tracing
Simulate projecting rays onto a sensor...
2222222222222222
Stochastic Ray TracingStochastic Ray TracingStochastic Ray Tracing
... and collect them to form an image
2222222222222222
Programming Stochastic Ray TracingProgramming Stochastic Ray TracingProgramming Stochastic Ray Tracing
• Normally thousands of lines of code
2222222222222222
Programming Stochastic Ray TracingProgramming Stochastic Ray TracingProgramming Stochastic Ray Tracing
• Normally thousands of lines of code
• Bears little resemblance to the physical process
2222222222222222
Programming Stochastic Ray TracingProgramming Stochastic Ray TracingProgramming Stochastic Ray Tracing
• Normally thousands of lines of code
• Bears little resemblance to the physical process
• In DrBayes, it’s simple physics simulation:
(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (dot v n))])
(if (> denom 0)(let ([t (/ (+ d (dot p0 n)) denom)]) (if (> t 0)
(collision t (vec+ p0 (vec* v t)) n)#f))
#f)))
2222222222222222
Programming Stochastic Ray TracingProgramming Stochastic Ray TracingProgramming Stochastic Ray Tracing
• Normally thousands of lines of code
• Bears little resemblance to the physical process
• In DrBayes, it’s simple physics simulation:
(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (dot v n))])
(if (> denom 0)(let ([t (/ (+ d (dot p0 n)) denom)]) (if (> t 0)
(collision t (vec+ p0 (vec* v t)) n)#f))
#f)))
• Other PPLs really aren’t up to this yet
2222222222222222
Programming Stochastic Ray TracingProgramming Stochastic Ray TracingProgramming Stochastic Ray Tracing
• Normally thousands of lines of code
• Bears little resemblance to the physical process
• In DrBayes, it’s simple physics simulation:
(define/drbayes (ray-plane-intersect p0 v n d) (let ([denom (- (dot v n))])
(if (> denom 0)(let ([t (/ (+ d (dot p0 n)) denom)]) (if (> t 0)
(collision t (vec+ p0 (vec* v t)) n)#f))
#f)))
• Other PPLs really aren’t up to this yet
• The issue is one of theory, not engineering effort
2222222222222222
Simpler ExampleSimpler ExampleSimpler Example
• Assume (random) returns a value uniformly in
3333333333333333
Simpler ExampleSimpler ExampleSimpler Example
• Assume (random) returns a value uniformly in
Density function for value of (random):
xxxxxxxxx
p(x
)p(x
)p(x
)p(x
)p(x
)p(x
)p(x
)p(x
)p(x
)
000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000
.25.25.25.25.25.25.25.25.25
.5.5.5.5.5.5.5.5.5
.75.75.75.75.75.75.75.75.75
111111111
1.251.251.251.251.251.251.251.251.25
3333333333333333
Simpler ExampleSimpler ExampleSimpler Example
• Assume (random) returns a value uniformly in
Density function for value of (random):
xxxxxxxxx
p(x
)p(x
)p(x
)p(x
)p(x
)p(x
)p(x
)p(x
)p(x
)
000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000
.25.25.25.25.25.25.25.25.25
.5.5.5.5.5.5.5.5.5
.75.75.75.75.75.75.75.75.75
111111111
1.251.251.251.251.251.251.251.251.25
3333333333333333
Simpler ExampleSimpler ExampleSimpler Example
• Assume (random) returns a value uniformly in
Density function for value of (random):
xxxxxxxxx
p(x
)p(x
)p(x
)p(x
)p(x
)p(x
)p(x
)p(x
)p(x
)
000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000
.25.25.25.25.25.25.25.25.25
.5.5.5.5.5.5.5.5.5
.75.75.75.75.75.75.75.75.75
111111111
1.251.251.251.251.251.251.251.251.25
3333333333333333
Simpler ExampleSimpler ExampleSimpler Example
• Assume (random) returns a value uniformly in
Density function for value of (max 0.5 (random)):
xxxxxxxxx
pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)
000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000
.25.25.25.25.25.25.25.25.25
.5.5.5.5.5.5.5.5.5
.75.75.75.75.75.75.75.75.75
111111111
1.251.251.251.251.251.251.251.251.25
pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ???
3333333333333333
Simpler ExampleSimpler ExampleSimpler Example
• Assume (random) returns a value uniformly in
Density function for value of (max 0.5 (random)):
xxxxxxxxx
pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)
000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000
.25.25.25.25.25.25.25.25.25
.5.5.5.5.5.5.5.5.5
.75.75.75.75.75.75.75.75.75
111111111
1.251.251.251.251.251.251.251.251.25
pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ???
3333333333333333
Simpler ExampleSimpler ExampleSimpler Example
• Assume (random) returns a value uniformly in
Density function for value of (max 0.5 (random)):
xxxxxxxxx
pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)
000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000
.25.25.25.25.25.25.25.25.25
.5.5.5.5.5.5.5.5.5
.75.75.75.75.75.75.75.75.75
111111111
1.251.251.251.251.251.251.251.251.25
pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ??? pₘ(x) = ???
3333333333333333
Simpler ExampleSimpler ExampleSimpler Example
• Assume (random) returns a value uniformly in
Density function for value of (max 0.5 (random)):
xxxxxxxxx
pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)pₘ(x
)
000000000 .25.25.25.25.25.25.25.25.25 .5.5.5.5.5.5.5.5.5 .75.75.75.75.75.75.75.75.75 111111111 1.251.251.251.251.251.251.251.251.25000000000
.25.25.25.25.25.25.25.25.25
.5.5.5.5.5.5.5.5.5
.75.75.75.75.75.75.75.75.75
111111111
1.251.251.251.251.251.251.251.251.25
pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist pₘ(x) doesn't exist
3333333333333333
What Can’t Densities Model?What Can’t Densities Model?What Can’t Densities Model?
4444444444444444
What Can’t Densities Model?What Can’t Densities Model?What Can’t Densities Model?
• Results of discontinuous functions (bounded measuring devices)
(let ([temperature (normal 99 1)]) (min 100 temperature))
4444444444444444
What Can’t Densities Model?What Can’t Densities Model?What Can’t Densities Model?
• Results of discontinuous functions (bounded measuring devices)
(let ([temperature (normal 99 1)]) (min 100 temperature))
• Variable-dimensional things (union types)
(if test? none (just x))
4444444444444444
What Can’t Densities Model?What Can’t Densities Model?What Can’t Densities Model?
• Results of discontinuous functions (bounded measuring devices)
(let ([temperature (normal 99 1)]) (min 100 temperature))
• Variable-dimensional things (union types)
(if test? none (just x))
• Infinite-dimensional things (recursion)
4444444444444444
What Can’t Densities Model?What Can’t Densities Model?What Can’t Densities Model?
• Results of discontinuous functions (bounded measuring devices)
(let ([temperature (normal 99 1)]) (min 100 temperature))
• Variable-dimensional things (union types)
(if test? none (just x))
• Infinite-dimensional things (recursion)
• In general: the distributions of program values
4444444444444444
Probability MeasuresProbability MeasuresProbability Measures
• Like already-integrated densities, but a primitive concept
5555555555555555
Probability MeasuresProbability MeasuresProbability Measures
• Like already-integrated densities, but a primitive concept
• Measure of (random) is , defined by
5555555555555555
Probability MeasuresProbability MeasuresProbability Measures
• Like already-integrated densities, but a primitive concept
• Measure of (random) is , defined by
5555555555555555
Probability MeasuresProbability MeasuresProbability Measures
• Like already-integrated densities, but a primitive concept
• Measure of (random) is , defined by
• Measure of (max 0.5 (random)) defined by
5555555555555555
Probability MeasuresProbability MeasuresProbability Measures
• Like already-integrated densities, but a primitive concept
• Measure of (random) is , defined by
• Measure of (max 0.5 (random)) defined by
This term assigns probability
5555555555555555
Probability MeasuresProbability MeasuresProbability Measures
• Like already-integrated densities, but a primitive concept
• Measure of (random) is , defined by
• Measure of (max 0.5 (random)) defined by
This term assigns probability
• Need a way to derive measures from code
5555555555555555
Probability Measures Via PreimagesProbability Measures Via PreimagesProbability Measures Via Preimages
• Interpret (max 0.5 (random)) as , defined
6666666666666666
Probability Measures Via PreimagesProbability Measures Via PreimagesProbability Measures Via Preimages
• Interpret (max 0.5 (random)) as , defined
• Derive measure of (max 0.5 (random)) as
6666666666666666
Probability Measures Via PreimagesProbability Measures Via PreimagesProbability Measures Via Preimages
• Interpret (max 0.5 (random)) as , defined
• Derive measure of (max 0.5 (random)) as
where
6666666666666666
Probability Measures Via PreimagesProbability Measures Via PreimagesProbability Measures Via Preimages
• Interpret (max 0.5 (random)) as , defined
• Derive measure of (max 0.5 (random)) as
where
• Factored into random and deterministic parts:
6666666666666666
Probability Measures Via PreimagesProbability Measures Via PreimagesProbability Measures Via Preimages
• Interpret (max 0.5 (random)) as , defined
• Derive measure of (max 0.5 (random)) as
where
• Factored into random and deterministic parts:
• In other words, compute measures of expressions by runningthem backwards
6666666666666666
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
7777777777777777
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
7777777777777777
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
7777777777777777
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
Efficient representation of arbitrary sets
7777777777777777
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
Efficient representation of arbitrary sets
Efficient way to compute areas of preimage sets
7777777777777777
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
Efficient representation of arbitrary sets
Efficient way to compute areas of preimage sets
Proof of correctness w.r.t. standard interpretation
7777777777777777
Crazy Idea is Feasible If...Crazy Idea is Feasible If...Crazy Idea is Feasible If...
• Seems like we need:
Standard interpretation of programs as pure functions from arandom source
Efficient way to compute preimage sets
Efficient representation of arbitrary sets
Efficient way to compute areas of preimage sets
Proof of correctness w.r.t. standard interpretation
• WAT
7777777777777777
What About Approximating?What About Approximating?What About Approximating?
Conservative approximation with rectangles:
8888888888888888
What About Approximating?What About Approximating?What About Approximating?
Conservative approximation with rectangles:
8888888888888888
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
9999999999999999
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
9999999999999999
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
9999999999999999
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
9999999999999999
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
9999999999999999
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
9999999999999999
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
9999999999999999
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
9999999999999999
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
9999999999999999
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
9999999999999999
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
9999999999999999
What About Approximating?What About Approximating?What About Approximating?
Restricting preimages to rectangular subdomains:
9999999999999999
What About Approximating?What About Approximating?What About Approximating?
Sampling: exponential to quadratic (e.g. days to minutes)
10101010101010101010101010101010
What About Approximating?What About Approximating?What About Approximating?
Sampling: exponential to quadratic (e.g. days to minutes)
10101010101010101010101010101010
Contribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea Feasible
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute preimage sets
• Efficient representation of arbitrary sets
• Efficient way to compute volumes of preimage sets
• Proof of correctness w.r.t. standard interpretation
11111111111111111111111111111111
Contribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea Feasible
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute abstract preimage subsets
• Efficient representation of arbitrary sets
• Efficient way to compute volumes of preimage sets
• Proof of correctness w.r.t. standard interpretation
11111111111111111111111111111111
Contribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea Feasible
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute abstract preimage subsets
• Efficient representation of abstract sets
• Efficient way to compute volumes of preimage sets
• Proof of correctness w.r.t. standard interpretation
11111111111111111111111111111111
Contribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea Feasible
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute abstract preimage subsets
• Efficient representation of abstract sets
• Efficient way to sample uniformly in preimage sets
• Proof of correctness w.r.t. standard interpretation
11111111111111111111111111111111
Contribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea Feasible
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute abstract preimage subsets
• Efficient representation of abstract sets
• Efficient way to sample uniformly in preimage sets
Efficient domain partition sampling
• Proof of correctness w.r.t. standard interpretation
11111111111111111111111111111111
Contribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea FeasibleContribution: Making This Crazy Idea Feasible
• Standard interpretation of programs as pure functions from arandom source
• Efficient way to compute abstract preimage subsets
• Efficient representation of abstract sets
• Efficient way to sample uniformly in preimage sets
Efficient domain partition sampling
Efficient way to determine whether a domain sample isactually in the preimage (just use standard interpretation)
• Proof of correctness w.r.t. standard interpretation
11111111111111111111111111111111
How Many Meanings?How Many Meanings?How Many Meanings?
• Start with pure programs, then lift by threading a random store
11111111111111111111111111111111
How Many Meanings?How Many Meanings?How Many Meanings?
• Start with pure programs, then lift by threading a random store
• Nonrecursive, nonprobabilistic programs: , ,
11111111111111111111111111111111
How Many Meanings?How Many Meanings?How Many Meanings?
• Start with pure programs, then lift by threading a random store
• Nonrecursive, nonprobabilistic programs: , ,
• Add 3 semantic functions for recursion and probabilistic choice
11111111111111111111111111111111
How Many Meanings?How Many Meanings?How Many Meanings?
• Start with pure programs, then lift by threading a random store
• Nonrecursive, nonprobabilistic programs: , ,
• Add 3 semantic functions for recursion and probabilistic choice
• Full development needs 2 more to transfer theorems frommeasure theory...
11111111111111111111111111111111
How Many Meanings?How Many Meanings?How Many Meanings?
• Start with pure programs, then lift by threading a random store
• Nonrecursive, nonprobabilistic programs: , ,
• Add 3 semantic functions for recursion and probabilistic choice
• Full development needs 2 more to transfer theorems frommeasure theory...
• ... oh, and 1 more to collect information for Monte Carlointegration
11111111111111111111111111111111
How Many Meanings?How Many Meanings?How Many Meanings?
• Start with pure programs, then lift by threading a random store
• Nonrecursive, nonprobabilistic programs: , ,
• Add 3 semantic functions for recursion and probabilistic choice
• Full development needs 2 more to transfer theorems frommeasure theory...
• ... oh, and 1 more to collect information for Monte Carlointegration
Tally: 3+3+2+1 = 9 semantic functions, 11 or 12 rules each
11111111111111111111111111111111
Enter Category TheoryEnter Category TheoryEnter Category Theory
• Moggi (1989): Introduces monads for interpreting effects
11111111111111111111111111111111
Enter Category TheoryEnter Category TheoryEnter Category Theory
• Moggi (1989): Introduces monads for interpreting effects
• Other kinds of categories: idioms, arrows
11111111111111111111111111111111
Enter Category TheoryEnter Category TheoryEnter Category Theory
• Moggi (1989): Introduces monads for interpreting effects
• Other kinds of categories: idioms, arrows
• Arrow defined by type constructor and thesecombinators:
11111111111111111111111111111111
Enter Category TheoryEnter Category TheoryEnter Category Theory
• Moggi (1989): Introduces monads for interpreting effects
• Other kinds of categories: idioms, arrows
• Arrow defined by type constructor and thesecombinators:
• Arrows are always function-like
11111111111111111111111111111111
Reducing ComplexityReducing ComplexityReducing Complexity
Function arrow: is just
11111111111111111111111111111111
Reducing ComplexityReducing ComplexityReducing Complexity
Function arrow: is just
11111111111111111111111111111111
Reducing ComplexityReducing ComplexityReducing Complexity
Function arrow: is just
11111111111111111111111111111111
Reducing ComplexityReducing ComplexityReducing Complexity
Function arrow: is just
11111111111111111111111111111111
Reducing ComplexityReducing ComplexityReducing Complexity
Function arrow: is just
11111111111111111111111111111111
Reducing ComplexityReducing ComplexityReducing Complexity
Function arrow: is just
11111111111111111111111111111111
Reducing ComplexityReducing ComplexityReducing Complexity
Function arrow: is just
11111111111111111111111111111111
Reducing ComplexityReducing ComplexityReducing Complexity
Function arrow: is just
11111111111111111111111111111111
Reducing ComplexityReducing ComplexityReducing Complexity
Function arrow: is just
11111111111111111111111111111111
Pair PreimagesPair PreimagesPair Preimages
12121212121212121212121212121212
Pair PreimagesPair PreimagesPair Preimages
12121212121212121212121212121212
Pair PreimagesPair PreimagesPair Preimages
:
12121212121212121212121212121212
Pair PreimagesPair PreimagesPair Preimages
and :
12121212121212121212121212121212
Pair PreimagesPair PreimagesPair Preimages
:
12121212121212121212121212121212
Correctness Theorems For Low, Low PricesCorrectness Theorems For Low, Low PricesCorrectness Theorems For Low, Low Prices
• Define
12121212121212121212121212121212
Correctness Theorems For Low, Low PricesCorrectness Theorems For Low, Low PricesCorrectness Theorems For Low, Low Prices
• Define
• Derive and others so that distributes; e.g.
12121212121212121212121212121212
Correctness Theorems For Low, Low PricesCorrectness Theorems For Low, Low PricesCorrectness Theorems For Low, Low Prices
• Define
• Derive and others so that distributes; e.g.
• Distributive properties makes proving this very easy:
Theorem (correctness). For all , .
12121212121212121212121212121212
Correctness Theorems For Low, Low PricesCorrectness Theorems For Low, Low PricesCorrectness Theorems For Low, Low Prices
• Define
• Derive and others so that distributes; e.g.
• Distributive properties makes proving this very easy:
Theorem (correctness). For all , .
In English: computes preimages under .
12121212121212121212121212121212
Correctness Theorems For Low, Low PricesCorrectness Theorems For Low, Low PricesCorrectness Theorems For Low, Low Prices
• Define
• Derive and others so that distributes; e.g.
• Distributive properties makes proving this very easy:
Theorem (correctness). For all , .
In English: computes preimages under .
• Other correctness proofs are similarly easy: prove 5 distributiveproperties
12121212121212121212121212121212
Correctness Theorems For Low, Low PricesCorrectness Theorems For Low, Low PricesCorrectness Theorems For Low, Low Prices
• Define
• Derive and others so that distributes; e.g.
• Distributive properties makes proving this very easy:
Theorem (correctness). For all , .
In English: computes preimages under .
• Other correctness proofs are similarly easy: prove 5 distributiveproperties
• Can add (random) and recursion to all semantics in one shot
12121212121212121212121212121212
AbstractionAbstractionAbstraction
Rectangle: An interval or union of intervals, a subset of , or for rectangles and
13131313131313131313131313131313
AbstractionAbstractionAbstraction
Rectangle: An interval or union of intervals, a subset of , or for rectangles and
• Easy representation; easy intersection, join (whichoverapproximates union), empty test, etc.
13131313131313131313131313131313
AbstractionAbstractionAbstraction
Rectangle: An interval or union of intervals, a subset of , or for rectangles and
• Easy representation; easy intersection, join (whichoverapproximates union), empty test, etc.
• Define (and therefore ) by replacing sets and setoperations with rectangles and rectangle operations
13131313131313131313131313131313
AbstractionAbstractionAbstraction
Rectangle: An interval or union of intervals, a subset of , or for rectangles and
• Easy representation; easy intersection, join (whichoverapproximates union), empty test, etc.
• Define (and therefore ) by replacing sets and setoperations with rectangles and rectangle operations
• Recursion is somewhat tricky—requires fine control overrecursion depth or if choices
13131313131313131313131313131313
In Theory...In Theory...In Theory...
Theorem (sound). computes overapproximations of thepreimages computed by .
• Consequence: Sampling in abstract preimages doesn’t leaveanything out
14141414141414141414141414141414
In Theory...In Theory...In Theory...
Theorem (sound). computes overapproximations of thepreimages computed by .
• Consequence: Sampling in abstract preimages doesn’t leaveanything out
Theorem (decreasing). never returns preimages larger thanthe given subdomain.
• Consequence: Refining abstract preimage sets never results in aworse approximation
14141414141414141414141414141414
In Theory...In Theory...In Theory...
Theorem (sound). computes overapproximations of thepreimages computed by .
• Consequence: Sampling in abstract preimages doesn’t leaveanything out
Theorem (decreasing). never returns preimages larger thanthe given subdomain.
• Consequence: Refining abstract preimage sets never results in aworse approximation
Theorem (monotone). is monotone.
• Consequence: Partitioning and then refining never results in aworse approximation 14141414141414141414141414141414
In Practice...In Practice...In Practice...
Theorems prove this always works:
15151515151515151515151515151515
In Practice...In Practice...In Practice...
Theorems prove this always works:
15151515151515151515151515151515
In Practice...In Practice...In Practice...
Theorems prove this always works:
15151515151515151515151515151515
In Practice...In Practice...In Practice...
Theorems prove this always works:
15151515151515151515151515151515
In Practice...In Practice...In Practice...
Theorems prove this always works:
15151515151515151515151515151515
In Practice...In Practice...In Practice...
Theorems prove this always works:
15151515151515151515151515151515
In Practice...In Practice...In Practice...
Theorems prove this always works:
15151515151515151515151515151515
In Practice...In Practice...In Practice...
Theorems prove this always works:
15151515151515151515151515151515
In Practice...In Practice...In Practice...
Theorems prove this always works:
15151515151515151515151515151515
Program Domain ValuesProgram Domain ValuesProgram Domain Values
16161616161616161616161616161616
Program Domain ValuesProgram Domain ValuesProgram Domain Values
• Program inputs are infinite binary trees:
16161616161616161616161616161616
Program Domain ValuesProgram Domain ValuesProgram Domain Values
• Program inputs are infinite binary trees:
• Every expression in a program is assigned a node
16161616161616161616161616161616
Program Domain ValuesProgram Domain ValuesProgram Domain Values
• Program inputs are infinite binary trees:
• Every expression in a program is assigned a node
• Implemented using lazy trees of random values
16161616161616161616161616161616
Program Domain ValuesProgram Domain ValuesProgram Domain Values
• Program inputs are infinite binary trees:
• Every expression in a program is assigned a node
• Implemented using lazy trees of random values
• No probability density for domain, but there is a measure 16161616161616161616161616161616
Example: Stochastic Ray TracingExample: Stochastic Ray TracingExample: Stochastic Ray Tracing
17171717171717171717171717171717
Example: Probabilistic VerificationExample: Probabilistic VerificationExample: Probabilistic Verification
(struct/drbayes float-any ())(struct/drbayes float (value error))
18181818181818181818181818181818
Example: Probabilistic VerificationExample: Probabilistic VerificationExample: Probabilistic Verification
(struct/drbayes float-any ())(struct/drbayes float (value error))
(define/drbayes (flsqrt x) (if (float-any? x)
x(let ([v (float-value x)]
[e (float-error x)]) (cond [(negative? v) (float-any)]
[(zero? v) (float 0 0)][else (float (sqrt v)
(+ (- 1 (sqrt (- 1 e)))(* 1/2 epsilon)))]))))
18181818181818181818181818181818
Example: Probabilistic VerificationExample: Probabilistic VerificationExample: Probabilistic Verification
(struct/drbayes float-any ())(struct/drbayes float (value error))
(define/drbayes (flsqrt x) (if (float-any? x)
x(let ([v (float-value x)]
[e (float-error x)]) (cond [(negative? v) (float-any)]
[(zero? v) (float 0 0)][else (float (sqrt v)
(+ (- 1 (sqrt (- 1 e)))(* 1/2 epsilon)))]))))
• Idea: sample e where (> (float-error e) threshold)
18181818181818181818181818181818
Example: Probabilistic VerificationExample: Probabilistic VerificationExample: Probabilistic Verification
(struct/drbayes float-any ())(struct/drbayes float (value error))
(define/drbayes (flsqrt x) (if (float-any? x)
x(let ([v (float-value x)]
[e (float-error x)]) (cond [(negative? v) (float-any)]
[(zero? v) (float 0 0)][else (float (sqrt v)
(+ (- 1 (sqrt (- 1 e)))(* 1/2 epsilon)))]))))
• Idea: sample e where (> (float-error e) threshold)
• Verified flhypot, flsqrt1pm1, flsinh in Racket’s mathlibrary, as well as others
18181818181818181818181818181818
Examples: Other Inference TasksExamples: Other Inference TasksExamples: Other Inference Tasks
• Typical Bayesian inference
Hierarchical models
Bayesian regression
Model selection
19191919191919191919191919191919
Examples: Other Inference TasksExamples: Other Inference TasksExamples: Other Inference Tasks
• Typical Bayesian inference
Hierarchical models
Bayesian regression
Model selection
• Atypical
Programs that halt with probability < 1, or never halt
Probabilistic context-free grammars with context-sensitiveconstraints
19191919191919191919191919191919
SummarySummarySummary
• Probabilistic inference is hard, so PPLs have been popping up
20202020202020202020202020202020
SummarySummarySummary
• Probabilistic inference is hard, so PPLs have been popping up
• Interpreting every program requires measure theory
20202020202020202020202020202020
SummarySummarySummary
• Probabilistic inference is hard, so PPLs have been popping up
• Interpreting every program requires measure theory
• Defined a semantics that computes preimages
20202020202020202020202020202020
SummarySummarySummary
• Probabilistic inference is hard, so PPLs have been popping up
• Interpreting every program requires measure theory
• Defined a semantics that computes preimages
• Measuring abstract preimages or sampling in them carries outinference
20202020202020202020202020202020
SummarySummarySummary
• Probabilistic inference is hard, so PPLs have been popping up
• Interpreting every program requires measure theory
• Defined a semantics that computes preimages
• Measuring abstract preimages or sampling in them carries outinference
• Can do a lot of cool stuff that’s normally inaccessible
20202020202020202020202020202020
https://github.com/ntoronto/drbayes
21212121212121212121212121212121