michal moskal and nikhil swamy research in software engineering (rise) microsoft research, redmond...

37
MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

Upload: reynaldo-place

Post on 30-Mar-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

MICHAL MOSKAL AND NIKHIL SWAMY

RESEARCH IN SOFTWARE ENGINEERING (RISE)MICROSOFT RESEARCH, REDMOND

August 8 – 11, 2013

ICFP PROGRAMMING CONTEST

Page 2: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

ORGANIZE THE CONTEST? WHO, ME?! NO THANKS!

That's a shame … because …

Page 3: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

THE CONTEST IS IN RUDE HEALTH! More than 550 teams registered to participate

You have the undivided attention of more than 1000 expert programmers for 72 hours! (mostly)

Wow! 72k programmer hours! That's a really valuable resource!

Organize the contest? Hell yeah!

Page 4: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

WHAT QUESTION DO WE WANT 1000 EXPERT PROGRAMMERS TO ANSWER?

Traditionally: Which is the best programming language?

Page 5: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

WHICH IS THE BEST PROGRAMMING LANGUAGE? Boring!

The answer is easy:

Page 6: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

WHICH IS THE BEST PROGRAMMING LANGUAGE? The question is a bit bogus

It depends on the programmerExpert programmers can use whatever and do wellEven ASM has placed well in past ICFPCs

It depends on the taskWinning team this year used 6 languages for different sub-tasks

Page 7: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

WHICH IS THE BEST PROGRAMMING LANGUAGE? Let's not focus so much on this question …

Page 8: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

QUESTION THIS YEAR:

What's up with program synthesis?

Page 9: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

Can we calibrate research on program synthesis against

what an army of crack programmers can do?

Page 10: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

CALIBRATING PROGRAM SYNTHESIS Synthesis of loop-free programs; Gulwani et al.; PLDI 2011Uses an SMT solver to synthesize bit-vector programs

Scales to 16 instructions in at most 45 minutesApplications to super-optimization etc.

Big improvement over prior toolsSketch (2006): Solar Lezama et al., scales to 8 instructions

AHA (2002): scales to 6-8 instructions

Page 11: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

CALIBRATING PROGRAM SYNTHESIS Synthesis of loop-free programs; Gulwani et al.; PLDI 2011Uses an SMT solver to synthesize bit-vector programs Scales to 16 instructions in at most 45 minutes

16 instructions is quite a lot! SMT solvers are cool! Naïvely, search space = ~10^16

But, is that it?

Page 12: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

AUGUST 8 – 11, 2013

300+ teams wrote tools to synthesize bit-vector programs

We evaluated these tools on a set of 1,800 benchmark problems

Our main goal:How would the top-teams fare against the best SMT solutions?

A (not-so-)secret hope:Some of the best teams would end up using SMT solvers

Page 13: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

THE PROGRAM SYNTHESIS GAME

GAME

PLAYER

Can you tell me what A(16), A(42),

A(128) are?

I have a secret program A. Can

you guess what it is? You have 5

minutes.

A(16)=17,A(42)=43,

A(128)=129.

Ah. I bet A = λx. x+1

Let me check … Nope. A(9)=9.

Hmm. Ok, so what is A(11) and

A(12) then?

Since you ask so nicely:

A(11)=12 and A(12)=13

Ah ha! I guess A =

λx. if x & 1 = 0 then x else x + 1

Let me check … Yep! That's right!

You score one point.

query.smt2A ≈ λx. x+1 ?

Yes!No! Counterexample: A(9) <> (λx.x+1) 9

query.smt2A ≈ λx. if x&1=0…?

Page 14: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

PUNCH LINE

THE WINNING TEAMS WERE AMAZING!

Main goal: Calibration

Winners were synthesizing programs 40 instructions long!Our reference SMT-based solutions maxed out at 15-16Recall: the difficulty is exponential in the problem size

Secret-hope: SMT usageMany top-10 teams tried SMT, but all opted for hand-tuned, brute force search, with lots of smart pruning heuristics

Winning team parallelized the search and used 1000 hours of compute time on Amazon EC2

Page 15: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

40 VS. 16! WHAT'S UP WITH THAT?

Elegant general-purpose formulations in terms of constraint solving:

Relatively easy to code up and obtain decent results

But, hand-tuned solutions are going to do better … MUCH BETTER

If you really want to super-optimize something:Smart search for 1000 hours is cheap!

Page 16: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

1. NEED TO DECIDE EQUIVALENCE EFFECTIVELY

GAME

PLAYER

Ah ha! I guess A =

λx. if x & 1 = 0 then x else x + 1

query.smt2

A ≈ λx. e ?

Yep! That's right! You score one

point.

Yes!No! Counterexample: A(17) <> (λx.e) 17

No dice! A(17)=18.

Page 17: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

\BV: FUNCTIONS ON 64-BIT VECTORS p ::= λx.e

e ::= 0 | 1 | x | op1 e | e op2 e

| if0 e then e else e | fold e e λx y.e

op1 ::= not | shl1 | shr1 | shr4 | shr16

op2 ::= and | or | xor | plus

Z3 implements a decidable theory of bit-vectors

So, equivalence checking on \BV programs is decidable …

But, it's NP-hard and can be quite expensive

Page 18: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

2. NEED TO SCALE TO MILLIONS OF REQUESTS

GAME

PLAYER

Ah ha! I guess A =

λx. if x & 1 = 0 then x else x + 1

query.smt2

A ≈ λx. e ?

Yep! That's right! You score one

point.

Yes!No! Counterexample: A(17) <> (λx.e) 17

No dice! A(17)=18.

Page 19: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

ELASTIC SCALING ON THE WINDOWS AZURE CLOUD

We were set up to run Z3 on up to 128 cores on Azure

Page 20: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

THROTTLING REQUESTS

Each team was assigned an authorization tokenTokens were distributed in a pre-registration phase (loud complaints about this!)

Token granted a team the ability to make 5 requests/20 seconds

Z3 given 20 seconds to decide equivalence, but typically completed in less than 5 seconds

Page 21: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

PEAK: 40 REQUESTS/SECOND ON 23 CORES

Page 22: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

Z3 HANDLED A MILLION REQUESTS Z3 received approx. 1 million requests over the weekend

Successfully decided all except ~300 in less than 20 seconds (many in just milliseconds)

Timeouts did not contribute to score

But, scores were adjusted slightly after the end of the competition

No team's position changed

Page 23: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

3. NEED TO GENERATE ~100K PROBLEM INSTANCES

GAME

PLAYER

I have a secret program A. Can

you guess what it is? You have 5

minutes.

Page 24: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

1400 RANDOMLY GENERATED PROBLEMS ASSIGNED TO EACH TEAM Categorized by size and whether or not the program contains fold

Totally: 70 categories

Low barrier to entry: 300 problems are really easy to solve

Increasing difficulty With some cleverness, about 800 could be solved Remaining 300 are super-hard (at least for us)

Page 25: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

1400 RANDOMLY GENERATED PROBLEMS ASSIGNED TO EACH TEAM Categorized by size and whether or not the program contains fold

Totally: 70 categories

Contestants needed to balance risk vs. rewardA large random program may be semantically equivalent to a small one

But, also a bit noisy

Page 26: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

+400 BONUS PROBLEMS BUILT FROM HARD NUGGETS

Exactly the same 400 assigned to all teamsAim to differentiate the best teams

Randomly generate 1000s of nuggets {p1, …, pn} each of size 14

Use Z3 to prove that there exists no program of size 12 or less equivalent to any of the nuggets

Build larger programs from nuggets: if0 pi then pj else pk

Page 27: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

WHAT WE USED

Z3, F#, TypeScript, JavaScript, TouchDevelop, and Windows Azure are great tools for organizing a programming contest!

Page 28: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

WINNERS

Page 29: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

JUDGES' PRIZE: KUMA-Yusuke Endoh and Nayuko Watanabe are an extremely cool bunch of hackers!

We were particularly impressed by your compact and elegant Ruby code and are surprised that a scripting language could perform well enough to be competitive at this computationally intensive task.

That's great validation for the new generational GC produced by youand other Ruby implementers. Congratulations!

RGenGC was developed by Koichi Sasada

Awarded $250

Page 30: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

LIGHTNING DIVISION WINNER: ITF C++ is very suitable for rapid prototyping.

Kojiro Izuka, Hiroshi Maeda, Ryosuke Kayanagi

University of Tsukuba, Japan

Awarded $250

Page 31: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

3RD PLACE: HACK THE LOOP

C#, C++, bash, awk, sed, and Excel are not too shabby

Pavel Egorov, Andrew Kostousov, Alexey Mogilnikov, Sergey Azovskov, Alexey Buslavyev, Kseniya Zhagorina,Denis Dublennyh, Eugeny Klyukin, Maxim Sannikov, Vladislav Isenbaev

SKB Kontur, QRGL, FacebookRussian Federation

Awarded $250

DECLINED!Our team decided not to claim our prize. We would be glad if our prize will go to the needs of orphans, homeless children, functional programmers in need or other type of charity.

Page 32: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

2ND PLACE: F5 ATTACKERS

C++ and Python are fine programming tools for many applications

Noriyuki Futatsugi, Takashi NakamuraTai Fukuzawa, Nobuaki Tanaka, Takaaki Hiragushi

Fixstars Corporation and University of TsukubaJapan

Awarded $500

Page 33: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

WINNER: UNAGI—THE SYNTHESISJava, C#, C++, PHP, Ruby, and Haskell are programming tools of choice for discriminating hackers

Takuya Akiba, Yoichi Iwata, Kentaro Imajo, Toshiki Kataoka, Naohiro Takahashi, Hiroaki Iwami

University of Tokyo, Google, Keio University and AtCoderJapan

Awarded $1000

Thanks to SIGPLAN, John Tristan and Greg Morrisett for managing all the issues related to prizes

Page 34: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

UNAGI'S SOLUTION: SCORE 1696/1800BRUTE FORCE + PRUNING + MULTIPLE STRATEGIES IN PARALLEL RUNNING IN THE EC2 CLOUD

•~(~x)=x•~(if0 x (~y) z) = if0 x y ~z•((x<<1)>>1)<<1 = x<<1•((x>>1)<<1)>>1=x>>1•(x>>4)>>1=(x>>1)>>4•y>>16=0   (where y is a left variable of fold)•y&x=x&y•x&x=x•x&~0=x•x&0=0•(y&(x&z))=x&(y&z)•~x&x=0•1&(x<<1)=0•x^~y=~(x^y)•if0 constant x y = x (or y)•if0 x y y = y•if0 x 0 x = x•if0 x x y = if0 x 0 y

Page 35: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

WE AREN'T QUITE DONE WITH THIS YET Lots of data to analyze

Many different strategies employed, but many similar ones too Can we reverse engineer/categorize strategies from logs

Many other program synthesizers around (including several in RiSE)Tune them up and run them against this problem set

Page 36: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST

LOOKING AHEAD

72K PROGRAMMER-HOURS IS A VALUABLE RESOURCE

LET'S MAKE GOOD USE OF IT!

WHAT QUESTIONS COULD WE ASK IN THE FUTURE?CROWD-SOURCED PROGRAM DEVELOPMENT/BUG-FINDING?INVARIANT DISCOVERY?SEARCHING FOR INTERPOLANTS?…?

Page 37: MICHAL MOSKAL AND NIKHIL SWAMY RESEARCH IN SOFTWARE ENGINEERING (RISE) MICROSOFT RESEARCH, REDMOND August 8 – 11, 2013 ICFP PROGRAMMING CONTEST