#1 programming languages research and you: what miracles are we cooking up these days?

22
#1 Programming Languages Programming Languages Research and You: Research and You: What Miracles Are We Cooking Up These What Miracles Are We Cooking Up These Days? Days?

Upload: patricia-mitchell

Post on 31-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

#1

Programming LanguagesProgramming LanguagesResearch and You:Research and You:

What Miracles Are We Cooking Up These What Miracles Are We Cooking Up These Days?Days?

#2

Dog and Pony Show

High-Level Summary• I am Wes Weimer• I do PL research (plays well with others)• I may not be as cool as Jack• But I do have money• And I’m looking for students• Break-out session: see ~weimer

webpage– Likely: September 11, 3:30pm

#3

Talk Outline• What is PL research in general?• What have we done in the past? • Possible cool future research …• Hint: write down a key phrase, email or

talk to me later …

Professor? an grad student …

projects.

#4

Don’t We Already Have Compilers?

#5

Dismal View Of PL Research

C++

Java(or C#)

#6

PL Research: Qu’est-ce que c’est?• Study programs and

languages• 2002 US Annual Cost of

Software Errors: $60B– 0.6% of the GDP (NIST)– Cost of 1 bug: $2k-$10k

• Programs as artifacts– What should they be

doing?– Are they doing it?– Are they making

mistakes instead? – How might we fix them?

• Language Design– Make some things easier

e.g., compare Ruby / Python to C++

#7

Program Analyses• We write programs that analyze (or

transform) other programs– cf. testing, >50% of a project’s budget– Alias analyses, shape analyses, verifiers, …

• Doomed in theory but successful in practice

Simplest examples: dataflow analyses and type systems

#8

Domain-Specific Bug-Finding

• Embedded components (e.g., cellphones) are programmed with special languages

• Most large projects include their own custom languages (e.g., simulations, macros, mIRC scripts, game engines)

• These are harder to debug and have special semantics (= meanings)

• Example: UnrealScript is C-ish but has type qualifiers like transient and travel

• Example: “Players of [The Sims 2] are complaining that their artfully-crafted homes and mansions are beginning to resemble the Twilight Zone, thanks to an artifact of the game's design that causes hacks to spread like viruses from user to unwitting user.” SecurityFocus 2004-2005

• My research: found >800 real bugs in 4M LOC, proposed a new language feature to fix all of those errors, evaluated it with case studies

Work with Jason Lawrence to apply PL techniques to

vertex shader code (e.g., caching,

generating, optimization, …)

#9

Big Example #1: CCured• Make systems programs as safe as Java but as fast as C

– Safe = memory safety and type safety

• Take an important C program (e.g., apache, bind, openssl)• Run a program analysis to classify all of the pointers in

that program:– Safe Pointer = no arithmetic, no casts – Sequence Pointer = pointer arithmetic (i++), no casts – Wild Pointer = anything goes

• Take that classification and transform the program:– Safe Pointer = add a null check– Sequence Pointer = add bounds (and null) checks– Wild Pointer = add full dynamic type checking

• Resulting program is provably safe• But is < 30% slower than the original (cf. Purify: 50x

slower)

#10

Big Example #2: Specification Mining• In order to find bugs automatically, we must know what

the program should be doing• Formal partial-correctness specification • Problem: hard to get them in practice• Our approach: learn them (machine learning)• Learn English grammar from high school student essays

…• Take advantage of program structure and error

information (e.g., program is more likely to make a mistake near unexpected errors)

• On 1MLOC, our specification miner – learns 5.3x as many real specs as the next best miner– has 25x fewer false positives– those mined specs found 430 real bugs (vs 50-auto or 172-by-

hand)

1 2 3

5 4

SF.openSession

S.close

S.beginTransaction

error

errorT.rollback

T.commit

#11

Big Example #3: Error Reporting

• PL + Theory + Software Engineering• Even when we find bugs automatically,

they often are not fixed!• Bug reports are too confusing• Our approach: include a candidate patch

that provably fixes that bug and introduces no new known bugs

• Helps maintainer to understand code and bug report – bug is more likely to be fixed

#12

How would that work? • Somewhat like a spell-checker• Take the violated specification and construct a

new FSM that will find the closest “not mistake” to the mistake we’re fixing

• This gives a plan for fixing the bug: use PL to match that plan back to the source code

• Does it work? Trial with 76 bug reports. 66% of those with patches were addressed, vs. only 21% of those without (statistically significant)

A,0 B,0x

C,0y

D,0z

A,1 B,1x

C,1y

D,1z

ε (ins x) ε (ins y) ε (ins z)

A,2 B,2x C,2y

D,2z

ε (ins x) ε (ins y) ε (ins z)

x (del x)y (del y)z (del z)

x (del x)y (del y)z (del z)

A Bx

Cy

Dz

Given bug “xz” weproduce “x(ins y)z”

#13

Program Analyses For Security• Don’t want rogue programs to send our info to

MS or turn us into botnet zombies• Could we detect that (type systems for secure

information flow, format string vulnerabilities, setuid analyses, …)?

• Could we prevent that (bytecode verifiers, proof-carrying code, “data execution prevention”, …)

• Example: buffer overruns and remote code injections will soon be defeated. We believe the next wave of attacks will go after non-control data. We have a program analysis to find such critical data and a transformation to protect it automatically (using OS and VM hardware support).

• Dave Evans and I have grant money for a student to work on any project that combines PL and Security

#14

Program Analysis And Privacy• Project with Nina Mishra• Specification mining problem: need traces!• Can mutually distrusting parties work together to learn

specifications without giving away who has more bugs? Yes!

• Can Google, MSN, Yahoo, etc., share search and advertising and click-through data in such a way they can still make advertising money, defeat click-fraud, but without giving away who you are?

#15

How Do We Do Research?• Analysis and Design – create type systems,

invent transformations, take ideas from field X and use them in field Y, …

• Proofs – structural induction, type safety, construction, … (we’ve got Greek letters)

• Experiments – build systems and test them, falsifiable claims, reproducibility, …

#16

PL: Cosmic Mayonnaise

• Two favorite areas? No problem!• Since most of computing involves programs, it’s easy to

form a research project that crosscuts PL and …– Security, Embedded Systems, Software Engineering: as before– Systems: analyze J2EE ecommerce apps, distributed peer-to-

peer programs, “managed code operating systems”, concurrency, etc.

– Graphics: analyze the OpenGL or Direct3D aspects of programs, provide better support for programming on graphics cards, …

– Databases: add transactional or ACID semantics to languages, verify inlined SQL, support persistent objects, use DB techniques on program traces, safely inject query plans

– Theory: we make heavy use of DFAs (lexing), PDAs (parsing), NFAs (policies), linear logic (resource mgmt), temporal logic (fairness), approximation algorithms (more than graph-coloring register alloc)

– Machine Learning: specification mining, profiling, as before, … – Other: out of space on the slide …

#17

Faked Photo-Ops• Showing me

having fun with my grad students in exotic locales …

#18

The Breakfast of Champions

• At PL Research, we’ve pretty much got it all: theory and practice, glitzy killer apps and hard-core fundamental problems. There’s a lot to do, and that’s why we need people like you.

• Talk to your doctor of philosophy to see whether PL® is right for you. Side effects were generally mild and included reliable software, resistance to viruses, increased hacking opportunities, decreased development times, disappearing deadlocks and race conditions, ironclad APIs, firmer theoretical bases, better specifications, more privacy, …

#19

Summary• Wes Weimer – PL Research – Wants Students• Program Analyses – find critical data, analyze

shader code, …• Program Transformations – CCured, security,

fixing bugs, better bug reports, …• Bug-Finding – security bugs, type safety,

memory safety, resource usage, …• Specification Mining – partial correctness• PL And Privacy – specifications, search data• PL For Security – non-control data attacks, …

#20

Any Questions?• Even if you don’t care about PL (sigh!) I

would be happy to give advice about CS research, industry and grad school.

• Want more info? This was just an appetizer!

• Breakout session (likely September 11, 3:30)

#21

Big Example #X: SLAM• Verify critical properties of software or find bugs• Take an important program (e.g., a device driver)• Merge it with a property (e.g., no deadlocks, asynchronous

IRP handling, BSD sockets, database transactions, …)• Transform the result into a boolean program

– Same control flow, but only boolean variables• Use a model checker to explore the resulting state space

– Result 1: program provably satisfies property– Result 2: program violates property right here on line 92,376!

#22

Helping Out Testing• Finding bugs (e.g., bugs in Linux, bugs in Windows

device drivers, bugs in Java systems software, …)• Preventing bugs (change the language, or add a

step to the “make” process, cf. PREfast)• Automatically generating test cases• Limiting test cases that must be run on a check-in