cs 343 presentation concrete type inference department of computer science stanford university

25
CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Upload: catherine-brown

Post on 06-Jan-2018

220 views

Category:

Documents


2 download

DESCRIPTION

Fast Static Analysis of C++ Virtual Function Calls Bacon and Sweeney

TRANSCRIPT

Page 1: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

CS 343 presentationConcrete Type Inference

Department of Computer ScienceStanford University

Page 2: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Concrete type analysis… why we care

• Runtime cost of virtual method resolution is high• Reduction of code size• Call graphs needed for interprocedural analysis• Function inlining• Inference algorithms very expensive – coming up

with efficient algorithms is the challenge

Page 3: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Fast Static Analysis of C++ Virtual Function Calls

Bacon and Sweeney

Page 4: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Overview

• Goal: Resolving virtual function calls• Three Static analysis algorithms

– Unique name– Class Hierarchy Analysis (CHA)– Rapid Type Analysis (RTA)

Page 5: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Exampleclass A{ public:

virtual int foo(){return 1;};}

class B: public A { public:

virtual int foo(){return 2;};virtual int foo(int t){return I+1};

}

void main(){b* p = new B();int result1 = p-> foo(1);int result2 = p->foo();A* q = p;int result3 = q->foo();

}

A

B

int foo()

int foo()

int foo(int)

Page 6: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Unique Name

• Link time process• Doesn’t require access to source code• Checks mangled name• Unique signature implies replacing virtual

call with direct call

Page 7: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Class Hierarchy

• Uses static declared type with class hierarchy information

• Builds call graph• Replaces virtual calls with direct calls when

there are no derived classes for the static type

• Rely on type safety of language (sometimes need to disable downcasts)

Page 8: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Rapid Type Analysis

• Starts with call graph generated from CHA• Prunes the size the call graph based on static

information about class instantiation• Flow insensitive like CHA

– results in efficiency – Inherits limitations of flow insensitive analysis

• Rely on type safety of language (sometimes need to disable downcasts)

Page 9: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Results

• What biases the results? (C++)• Ran analysis algorithms on seven real

programs of varying size (large - small)• RTA wins 4 out of 7 WHY? Discuss• Static analysis can fail with certain

programming idioms (e.g. base*b = new sub() )

• Code Size: often reduces code size dramatically

Page 10: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Practical Virtual Method Call Resolution for Java

Sundaresan et al

Page 11: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Overview

• Study practical, context-insensitive, flow insensitive techniques to resolve virtual function calls in Java

• Present Reaching-type analysis– Variable-type analysis– Refers-to analysis

• Uses Soot(Jimple) framework

Page 12: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Three Groups of analysis

• Baseline (discussed previously)– Class hierarchy analysis– Rapid type Analysis

• Reaching type– Declared type analysis– Variable type analysis (more fine grain/accurate)

• Refers-to– Developed for C but ported to

Page 13: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Reaching-type Analysis

• Build a type propagation graph• Initialize the graph with type information

generated by new()• Propagate type information along directed

edges• Nodes are associated with all reaching types

Page 14: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Variable and Declared Type Analysis

• Variable Type (pg 10)– Uses variable name as the representative

• Declared Type (pg 11)– Uses the type by which the initial variable was declared– Puts all variables of the same declared type into the same

equivalence class– Coarser and less precise

• Both algorithms have an initialization phase and an propagation phase

• Size of propagation graph: O(C*Mc) edges

Page 15: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Refers-to Analysis

• Takes into account aliasing• Nodes

– Reference nodes (locals, parameters, instance fields)– Abstract location nodes (heap locations)

• Algorithm: Each reference node initially refers to a unique abstract location, assignments merge abstract locations as the algorithm progresses

Page 16: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Alternative Approaches• Type prediction

– Requires profiling code– Making the common case fast– Runtime type test– Resolves more calls

• Alias analysis– Very expensive (interprocedural, flow sensitive)

• Sometimes static analysis is not possible e.g. dynamically loaded classes based on command line inputs or newly available classes. Does anyone see a way to address this?

Page 17: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Benchmarks and Results• Ran on 9 programs, 7 of which are used in the SPECjvm

benchmark suite• Variable type analysis best at improving call graph

precision• Type based analysis more efficient because it build

nodes based on the classes in the program and not each individual variable

• Table II shows exact numbers for how many monomorphic edges…. So why couldn’t they resolve all of these? How did they get this information in the first place???

Page 18: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

“The Cartesian Product Algorithm”

Simple and Precise Type Inference of Parametric Polymorphism

Page 19: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Polymorphism

• Explicit concrete type declarations undesirable for programmer

• Algorithms must be used to infer types• Parametric polymorphism: ability of routines to be

invoked on arguments of several different types• CPA uses context sensitivity, whereas other

inference algorithms do not, this is key b/c CPA uses different code for each context

Page 20: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Basic Type Inference Algorithm

• Step 1: Allocate type variables (associate a type var with every slot and expression in the program)

• Step 2: Seed type variables (to capture the initial state of the target program)

• Step 3: Establish constraints, propagate (builds a directed graph that expresses propagation of types through assignments)

• Basic algorithm analyzes polymorphism imprecisely

Page 21: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Improvements on Basic Algorithm

• 1-Level Expansion– Different templates for each send– Inefficient

• P-Level (precise, yet worst-case complexity is exponential)

• Iterative algorithm (precise, more efficient than expansion)

Page 22: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Cartesian Product Algorithm• “There is no such thing as a polymorphic call, only

polymorphic call sites”• Turns the analysis of each send into a case analysis (makes

exact type info available for each case immediately, eliminates iteration)

• Maintain per-method pools of templates so that template-sharing can be achieved (efficiency)

• Iteration is avoided because of– Monotonicity of cartesian product– Monotone context of application (iterative is not monotone because

comparing types for equality is not a monotone function)• Efficient and precise (also, no need to expand away inheritance)

Page 23: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Precision improvements possible?

• Yes• mod arg = (self-(arg*(self div: arg) )• x mod: y, where type(x) = type(y) = {smallInt,

float}• Iterative algorithm infers {smallInt, float}• CPA infers {smallInt}• In this case, there is a benefit from having four

templates connected, one for each tuple in the product of the types of x and y

Page 24: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Results

• “Extractor” – having less precise information about type forces it to extract more

• CPA delivers the smallest extractions, and the best CPU time of the different algorithms

• How generalizable are the results from the Self system?

• How much type inference is even necessary for the programs they benchmarked (Unix diff command)?

Page 25: CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University

Thanks

caller callee