spreadsheet programming using examples · 2018. 1. 4. · methodology: based on divide-and-conquer...
TRANSCRIPT
![Page 1: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/1.jpg)
Sumit Gulwani
Spreadsheet
Programming using Examples
Keynote at SEMS
July 2016
![Page 2: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/2.jpg)
Motivation
99% of computer users cannot program!
They struggle with simple repetitive tasks.
1
Programming by examples (PBE)
can revolutionize this landscape!
![Page 3: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/3.jpg)
Spreadsheet help forums
2
![Page 4: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/4.jpg)
Typical help-forum interaction
300_w5_aniSh_c1_b w5
=MID(B1,5,2)
300_w30_aniSh_c1_b w30
=MID(B1,FIND(“_”,$B:$B)+1,
FIND(“_”,REPLACE($B:$B,1,FIND(“_”,$B:$B), “”))-1)
=MID(B1,5,2)
3
![Page 5: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/5.jpg)
Flash Fill (Excel 2013 feature) demo
“Automating string processing in spreadsheets using input-output examples”;
POPL 2011; Sumit Gulwani4
![Page 6: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/6.jpg)
Input Output
(Nearest lower half hour)
0d 5h 26m 5:00
0d 4h 57m 4:30
0d 4h 27m 4:00
0d 3h 57m 3:30
5
Number Transformations
Synthesizing Number Transformations from Input-Output Examples;
CAV 2012; Singh, Gulwani
Input Output
(Round to 2 decimal places)
123.4567 123.46
123.4 123.40
78.234 78.23
Excel/C#:
Python/C:
Java:
#.00
.2f
#.##
![Page 7: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/7.jpg)
6
Semantic String Transformations
Input v1 Input v2 Output
(Price + Markup*Price)
Stroller 10/12/2010 $145.67 + 0.30*145.67
Bib 23/12/2010 $3.56 + 0.45*3.56
Diapers 21/1/2011
Wipes 2/4/2009
Aspirator 23/2/2010
Id Name Markup
S33 Stroller 30%
B56 Bib 45%
D32 Diapers 35%
W98 Wipes 40%
A46 Aspirator 30%
Id Date Price
S33 12/2010 $145.67
S33 11/2010 $142.38
B56 12/2010 $3.56
D32 1/2011 $21.45
W98 4/2009 $5.12
CostRec Table
MarkupRec Table
Learning Semantic String Transformations from Examples;
VLDB 2012; Singh, Gulwani
![Page 8: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/8.jpg)
To get Started!
Data Science Class Assignment
7
![Page 9: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/9.jpg)
Ships inside two Microsoft products:
8
“FlashExtract: A Framework for data extraction by examples”;
PLDI 2014; Vu Le, Sumit Gulwani
ConvertFrom-String cmdlet
Custom Log, Custom Field
FlashExtract Demo
Powershell
![Page 10: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/10.jpg)
Layout Transformations
9Flashrelate: extracting relational data from semi-structured spreadsheets using examples;
PLDI 2014; Barowy, Gulwani, Hart, Zorn
Input Table Output Table
PBE allows creation of output table from couple of example tuples.
![Page 11: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/11.jpg)
Programming-by-Examples Architecture
Example-based
Intent Program set
(a sub-DSL of D)
DSL D
Program
Synthesizer
10
![Page 12: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/12.jpg)
• Balanced Expressiveness
– Expressive enough to cover wide range of tasks
– Restricted enough to enable efficient search
• Restricted set of operators
– those with small inverse sets
• Restricted syntactic composition of those operators
• Natural computation patterns
– Increased user understanding/confidence
– Enables selection between programs, editing
11
Domain-specific Language (DSL)
![Page 13: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/13.jpg)
Flash Fill DSL (String Transformations)
𝑇𝑢𝑝𝑙𝑒 𝑆𝑡𝑟𝑖𝑛𝑔 𝑥1, … , 𝑆𝑡𝑟𝑖𝑛𝑔 𝑥𝑛 → 𝑆𝑡𝑟𝑖𝑛𝑔
top-level expr if-then-else
condition-free expr
atomic expression
ConstantString
input string
position expression
| Pos
Boolean expression
Concatenate(A,C)
SubStr(X,P,P)
Kth position in X whose left/right side
matches with R1/R2.
![Page 14: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/14.jpg)
Programming-by-Examples Architecture
Example-based
Intent Program set
(a sub-DSL of D)
DSL D
Program
Synthesizer
13
![Page 15: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/15.jpg)
Goal: Set of program expr of kind 𝑒 that satisfies spec 𝜙
[denoted 𝑒 ⊨ 𝜙 ]
𝑒: DSL (top-level) expression
𝜙: Conjunction of (input state 𝜎 ⇝ output value 𝑣)
Methodology: Based on divide-and-conquer style problem
decomposition.
• 𝑒 ⊨ 𝜙 is reduced to simpler problems (over sub-expressions
of e or sub-constraints of 𝜙).
• Top-down (as opposed to bottom-up enumerative search).
14
Search Methodology
“FlashMeta: A Framework for Inductive Program Synthesis”;
OOPSLA 2015; Alex Polozov, Sumit Gulwani
![Page 16: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/16.jpg)
Let 𝑒 be a non-terminal defined as 𝑒 ≔ 𝑒1 | 𝑒2
𝑒 ⊨ 𝜙 =
15
Problem Reduction Rules
𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡( 𝑒 ⊨ 𝜙1 , 𝑒 ⊨ 𝜙2 )
𝑈𝑛𝑖𝑜𝑛( 𝑒1 ⊨ 𝜙 , 𝑒2 ⊨ 𝜙 )
𝐹𝑖𝑙𝑡𝑒𝑟( 𝑒 ⊨ 𝜙1 , 𝜙2)
𝑒 ⊨ 𝜙1 ∧ 𝜙2 =
𝑒 ⊨ 𝜙1 ∧ 𝜙2 =
An alternative strategy:
![Page 17: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/17.jpg)
Inverse Set: Let F be an n-ary operator.
𝐹−1 𝑣 = 𝑢1, … , 𝑢𝑛 𝐹 𝑢1, … , 𝑢𝑛 = 𝑣}
16
Problem Reduction Rules
𝐶𝑜𝑛𝑐𝑎𝑡−1 "Abc" =
[𝐶𝑜𝑛𝑐𝑎𝑡 𝑋, 𝑌 ⊨ (𝜎 ⇝ "Abc")] = Union({𝐶𝑜𝑛𝑐𝑎𝑡( 𝑋 ⊨ 𝜎 ⇝ "Abc" , 𝑌 ⊨ 𝜎 ⇝ 𝜖 ), 𝐶𝑜𝑛𝑐𝑎𝑡 𝑋 ⊨ 𝜎 ⇝ "Ab" , 𝑌 ⊨ 𝜎 ⇝ "𝑐" ,𝐶𝑜𝑛𝑐𝑎𝑡 𝑋 ⊨ 𝜎 ⇝ "A" , 𝑌 ⊨ 𝜎 ⇝ "𝑏𝑐" ,𝐶𝑜𝑛𝑐𝑎𝑡 𝑋 ⊨ 𝜎 ⇝ ϵ , 𝑌 ⊨ 𝜎 ⇝ "𝐴𝑏𝑐" })
{ "Abc",ϵ , ("𝐴𝑏","c"), ("A","bc"), (ϵ, "Abc")}
𝐹 𝑒1, … , 𝑒𝑛 ⊨ 𝜎 ⇝ 𝑣 =
𝑈𝑛𝑖𝑜𝑛({F e1 ⊨ 𝜎 ⇝ 𝑢1 , … , 𝑒𝑛 ⊨ 𝜎 ⇝ 𝑢𝑛 | 𝑢1, … , 𝑢𝑛 ∈ 𝐹−1 𝑣 }
𝐹 𝑆1, … , 𝑆𝑛 denotes 𝐹 𝑒1, … , 𝑒𝑛 𝑒1 ∈ 𝑆1, … , 𝑒𝑛 ∈ 𝑆𝑛}
![Page 18: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/18.jpg)
Programming-by-Examples Architecture
Example-based
IntentProgram set
(a sub-DSL of D)
DSL DRanking fn
Program
Synthesizer
17
Ranked Program set
(a sub-DSL of D)
![Page 19: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/19.jpg)
Prefer simpler programs
• Fewer constants.
• Smaller constants.
18
Ranking scheme: Program features
Input Output
Rishabh Singh Rishabh
Ben Zorn Ben
• 1st Word
• If (input = “Rishabh Singh”) then “Rishabh” else “Ben”
• “Rishabh”
“Predicting a correct program in Programming by Example”;
[CAV 2015] Rishabh Singh, Sumit Gulwani
![Page 20: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/20.jpg)
Prefer simpler programs
• Fewer constants.
• Smaller constants.
19
Ranking scheme: Data features
How to select between programs with
same number of same-sized constants?
Input Output
Missing page numbers, 1993 1993
64-67, 1995 1995
• 1st Number from the beginning
• 1st Number from the end
Prefer programs that generate more uniform output.
![Page 21: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/21.jpg)
• Core Synthesis Architecture
– Domain-specific Language
– Search methodology
– Ranking function
Next generation Synthesis
– Interactive
– Predictive
– Adaptive
20
Outline
![Page 22: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/22.jpg)
Programming-by-Examples Architecture
Example
based
Intent
Ranked Program set
(a sub-DSL of D)
DSL DTest inputs
Intended Program in D
Intended Program in R/Python/C#/C++/…
Translator
Ranking fn
Program
Synthesizer Debugging
Refined Intent
21
Incrementality
![Page 23: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/23.jpg)
Interactive Debugging
22
![Page 24: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/24.jpg)
• Intended programs can sometimes be synthesized from just
the input.
– Tabular data extraction, Sort, Join
• Can save large amount of user effort.
– User need not provide examples for each of tens of columns.
23
Predictive
![Page 25: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/25.jpg)
Programming-by-Examples Architecture
Example
based
Intent
Ranked Program set
(a sub-DSL of D)
DSL DTest inputs
Intended Program in D
Intended Program in R/Python/C#/C++/…
Translator
Ranking fn
Program
Synthesizer Debugging
Refined Intent
Incrementality
24
![Page 26: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/26.jpg)
• Learn from past interactions
– of the same user (personalized experience).
– of other users in the enterprise/cloud.
• The synthesis sessions now require less interaction.
25
Adaptive
![Page 27: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/27.jpg)
Programming-by-Examples Architecture
Example
based
Intent
Ranked Program set
(a sub-DSL of D)
DSL D
Interaction history
Test inputs
Intended Program in D
Intended Program in R/Python/C#/C++/…
Translator
Learner
Ranking fn
Program
Synthesizer Debugging
Refined Intent
Incrementality
26
![Page 28: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/28.jpg)
https://microsoft.github.io/prose
• Efficient implementation of the generic search methodology.
• Provides a library of reduction rules.
Role of synthesis designer
• Implement a DSL and provide reduction rules for new operators.
• Provide ranking strategy.
• Can specify tactics to resolve non-determinism in search.
27
PROSE Framework
![Page 29: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/29.jpg)
Vu Le
The PROSE Team
Sumit
Gulwani
Daniel
Perelman
Danny
Simmons
Adam
SmithMohammad
Raza
Abhishek
Udupa
Allen
CypherRanvijay
Kumar
Alex
Polozov
We are hiring interns/full-time!
![Page 30: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/30.jpg)
• Learn from usage data
• Probabilistic noise handling
• Programming using natural language
• Application to robotics
29
Future Directions
![Page 31: Spreadsheet Programming using Examples · 2018. 1. 4. · Methodology: Based on divide-and-conquer style problem decomposition. • 𝑒⊨𝜙is reduced to simpler problems (over](https://reader033.vdocuments.us/reader033/viewer/2022060800/6083f1156324d247d57da586/html5/thumbnails/31.jpg)
• PBE can enable easier & faster data wrangling.
– 99% of computer users are non-programmers.
– Data scientists spend 80% time cleaning data.
• Algorithmic search
– Domain-specific language
– Deductive methodology based on back-propagation
• Ambiguity resolution
– Ranking
– Interactivity
30
Conclusion
Reference: “Programming by Examples (and its applications in Data Wrangling)”,
In Verification and Synthesis of Correct and Secure Systems; IOS Press; 2016
[based on Marktoberdorf Summer School 2015 Lecture Notes]