cil: infrastructure for c program analysis and transformation
Post on 30-Dec-2015
19 Views
Preview:
DESCRIPTION
TRANSCRIPT
CIL: Infrastructure for C Program Analysis and Transformation
George C. Necula, Scott McPeak,S. P. Rahul, Westley Weimer
http://www.cs.berkeley.edu/~necula/cil
ETAPS – CC ’02 Friday, April 12
What is CIL?
Distills C language into a few key forms with precise semantics
Parser + IR + Program Merger for CMaintains types, close ties to sourceHighly structured, clean subset of CHandles ANSI/GCC/MSVC
Why CIL?
Analyses and TransformationsEasy to use impersonates compiler & linker $ make project CC=cil
Easy to work with converts away tricky syntax leaves just the heart of the language separates concepts
C Feature Separation
CIL separates language components pure expressions statements with side-effects control-flow embedded CFG
Keeps all programmer names temps serialize side-effects simplified scoping
Example: C Lvalues
An exp referring to a region of storageExample: rec[1].fld[2]May involve 1, 2, 3 memory accesses 1 if rec and fld are both arrays 2 if either one is a pointer 3 if rec and fld are both pointers
Syntax (AST) is insufficient
CIL Lvalues
An exp referring to a region of storage
lval ::= <base offset>base ::= Var(varinfo) | Mem(exp)offset ::= None | Field(f offset) | Index(exp offset)
CIL Lvalues
Example: rec[1].fld[2] becomes either:<Var(rec), Index(1, Field(fld, Index(2, None)))> or:<Mem(2 + Lvalue(<Mem(1 + Lvalue(<Var(rec),
None>)), Field(fld, None)>), None>
Full static and operational semantics
Semantics
CIL gives syntax-directed semanticsExample judgment:
(x) = ` Var(x) (&x,)
environment
lvalue formmeaning
CIL Lvalue Semantics
(x) =
`Var(x) (&x,)
` e : Ptr()
`Mem(e) (e,)
` b (a,)
`None@b (a,)
` b (a1,Arr(1)) `o@(a1+e|1|,1) (a2,2)
`Index(e,o)@b (a2,2)
` o@b (a,)
`<b,o> (a,)
CIL Source Fidelity
CIL output:struct __anonstruct1 { int fld[3] ;}; typedef struct
__anonstruct1 * Myptr;Myptr rec;(rec + 2)->fld[1] = (int)’h’;
SUIF 2.2.0-4 output:typedef int __ar_1[3];struct type_1 { __ar_1 fld; };struct type_1 * rec;(((((int *)(((char *)&((((struct
type_1 *) (rec))))[2])+0U))))[1]) =(104);
typedef struct { int fld[3]; } * Myptr;Myptr rec;rec[2].fld[1] = ’h’;
Corner Cases
Your analysis will not have to handle: return ({goto L; p;}) && ({L: 5;}); return &(--x ? : z) - & (x++, x);
Full handling of GNU-isms, MSVC-isms attributes initializers
Corner Cases
Your analysis will not have to handle: return ({goto L; p;}) && ({L: 5;});
int tmp;
goto L;
if (p) { L: tmp = 1; }
else { tmp = 0; }
return tmp;
StackGuard Transform
Cowan et al., USENIX ’98Buffer overrun defense push return addess on private stack pop before returning only change functions with local arrays
40 lines of commented code with CILQuite easy: uses visitors for tree replacement, explicit returns, etc.
Other Transforms
Instrument and log all calls: 150 linesEliminate break, continue, switch: 1101 memory access per assignment: 100Make each function have a single return statement: 90Make all stack arrays heap-allocated: 75Log all value/addr memory writes: 45
Whole-Program Merger
C has incremental linking, compilation coupled with a weak module system!
Example (vortex / gcc / c++2c):
/* foo.c */
struct list { int head;
struct list * tail;
};
struct list * mylist;
/* bar.c */
struct chain { int head;
struct chain * tail;
};
extern struct chain * mylist;
Merging a Project
Determine what files to mergeMerge the files handle file-scoped identifiers C uses name equivalence for types but modules need structural equivalence
Key: Each global identifier has 1 type!
Other Merger Details
Remove duplicate declarations every file includes <stdio.h>
Match struct pointer with no defined body in file A to defined body in file B
Be careful when picking representatives
How Does it Work?
Make project, pass all files through CILRun your transform and analysisEmit simplified CCompile simplified C with GCC/MSVC… and it works!
Large Programs
Program #LOC *.[ch]
Notes
SPECINT95 360K
GIMP-1.2.2 800K large libraries
linux-2.4.5 2.5M 132% compile time
ACE (in C) 2M 2000 files
Used in the CCured and BLAST projects
Merged Kernel Stats
Stock monolithic Linux 2.4.5 kernelhttp://manju.cs.berkeley.edu/cil/vmlinux.cStatistics: Before | After 324 files | One 12.5MB file 11.3 M-words | 1.5 M-words 7.3 M-LOC (post-process) | 470 K-LOC$ make CC=“cil –merge” HOSTCC=“cil –merge” LD=“cil –merge” AR=“cil –mode=AR –merge”
Conclusion
CIL distills C to a precise, simple subset easy to analyze well-defined semantics close to the original source
Well-suited to complex analyses and source-to-source transformsParses ANSI/GCC/MSVC CRapidly merges large programs
Questions?
Try CIL out:
http://www.cs.berkeley.edu/~necula/cil
Complete source, documentation and test cases freely available
top related