basic program analysis - columbia universitybasic program analysis suman jana *some slides are...
TRANSCRIPT
![Page 1: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/1.jpg)
Basic Program Analysis SumanJana
*someslidesareborrowedfromBaishakhiRayandRasBodik
![Page 2: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/2.jpg)
Our Goal
ProgramAnalyzer
SourcecodeSecuritybugs
Programanalyzermustbeabletounderstandprogramproper?es(e.g.,canavariablebeNULLata
par?cularprogrampoint?)
Mustperformcontrolanddataflowanalysis
![Page 3: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/3.jpg)
Do we need to implement control and data flow analysis from scratch?
• Mostmoderncompilersalreadyperformseveraltypesofsuchanalysisforcodeop?miza?on• Wecanhookintodifferentlayersofanalysisandcustomizethem• Wes?llneedtounderstandthedetails
• LLVM(hNp://llvm.org/)isahighlycustomizableandmodularcompilerframework• UserscanwriteLLVMpassestoperformdifferenttypesofanalysis• Clangsta?canalyzercanfindseveraltypesofbugs• Caninstrumentcodefordynamicanalysis
![Page 4: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/4.jpg)
Compiler Overview
• AbstractSyntaxTree:SourcecodeparsedtoproduceAST• ControlFlowGraph:ASTistransformedtoCFG• DataFlowAnalysis:operatesonCFG
![Page 5: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/5.jpg)
The Structure of a Compiler
5
scanner
parser
checker
codegen
Sourcecode(streamofcharacters)
streamoftokens
AbstractSyntaxTree(AST)
ASTwithannota?ons(types,declara?ons)
Machine/bytecode
![Page 6: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/6.jpg)
SyntacBc Analysis
• Input:sequenceoftokensfromscanner• Output:abstractsyntaxtree• Actually,
• parserfirstbuildsaparsetree• ASTisthenbuiltbytransla?ngtheparsetree• parsetreerarelybuiltexplicitly;onlydeterminedby,say,howparserpushesstufftostack
6AdoptedFromUCBerkeley:Prof.BodikCS164Lecture5
![Page 7: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/7.jpg)
Example
• SourceCode 4*(2+3)
• ParserinputNUM(4) TIMES LPAR NUM(2) PLUS NUM(3) RPAR
• Parseroutput(AST):
AdoptedFromUCBerkeley:Prof.BodikCS164Lecture5 7
*
NUM(4) +
NUM(2) NUM(3)
![Page 8: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/8.jpg)
Parse tree for the example: 4*(2+3)
AdoptedFromUCBerkeley:Prof.BodikCS164Lecture5 8
leavesaretokens
NUM(4)TIMESLPARNUM(2)PLUSNUM(3)RPAR
EXPR
EXPR
EXPR
![Page 9: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/9.jpg)
Another example • SourceCode
if(x==y){a=1;}• Parserinput
IF LPAR ID EQ ID RPAR LBR ID AS INT SEMI RBR
• Parseroutput(AST):
AdoptedFromUCBerkeley:Prof.BodikCS164Lecture5 9
IF-THEN
==
ID ID
=
ID INT
![Page 10: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/10.jpg)
Parse tree for example: if (x==y) {a=1;}
AdoptedFromUCBerkeley:Prof.BodikCS164Lecture5 10
IFLPARID==IDRPARLBRID=INTSEMIRBR
EXPREXPR
STMT
BLOCK
STMT
leavesaretokens
![Page 11: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/11.jpg)
Parse Tree
• Representa?onofgrammarsinatree-likeform.
• Isaone-to-onemappingfromthegrammartoatree-form.
Aparsetreepictoriallyshowshowthestartsymbolofagrammarderivesastringinthe
language.…DragonBook
![Page 12: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/12.jpg)
CStatement:returna+2
averyformalrepresenta?onthatstrictlyshowshowtheparser
understandsthestatementreturna+2;
![Page 13: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/13.jpg)
Abstract Syntax Tree (AST)
• Simplifiedsyntac?crepresenta?onsofthesourcecode,andthey'remostopenexpressedbythedatastructuresofthelanguageusedforimplementa?on
• Withoutshowingthewholesyntac?ccluNer,representstheparsedstringinastructuredway,discardingallinforma?onthatmaybeimportantforparsingthestring,butisn'tneededforanalyzingit.
ASTsdifferfromparsetreesbecausesuperficialdis?nc?onsofform,unimportantfortransla?on,donotappearinsyntaxtrees..…DragonBook
![Page 14: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/14.jpg)
CStatement:returna+2
AST
![Page 15: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/15.jpg)
Disadvantages of ASTs
• ASThasmanysimilarforms• E.g.,for,while,repeat...un?l• E.g.,if,?:,switch
• ExpressionsinASTmaybecomplex,nested• (x*y)+(z>5?12*z:z+20)
• Wantsimplerrepresenta?onforanalysis• ...atleast,fordataflowanalysis
15
intx=1//what’sthevalueofx?//ASTtraversalcangivetheanswer,right?Whataboutintx;x=1;orintx=0;x+=1;?
![Page 16: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/16.jpg)
Control Flow Graph & Analysis
![Page 17: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/17.jpg)
RepresenBng Control Flow
AdoptedFromUPennCIS570:ModernProgrammingLanguageImplementa?on(Autumn2006)
High-level representation – Control flow is implicit in an AST
Low-level representation: – Use a Control-flow graph (CFG)
– Nodes represent statements (low-level linear IR) – Edges represent explicit flow of control
![Page 18: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/18.jpg)
What Is Control-Flow Analysis?
AdoptedFromUPennCIS570:ModernProgrammingLanguageImplementa?on(Autumn2006)
1 2
a := 0 b := a * b
3 L1: c := b/d
4 5 6
if c < x goto L2 e := b / c f := e + 1
7 L2: g := f
8 9
h := t - g if e > 0 goto L3
10 goto L1 11 L3: return
a := 0 b := a * b
e := b / c f : e + 1
g:=fh:=t–gIfe>0?
goto return
c := b / d c < x?
1
3
5
7
1110
Yes No
![Page 19: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/19.jpg)
Basic Blocks
• A basic block is a sequence of straight line code that can be entered only at the beginning and exited only at the end
AdoptedFromUPennCIS570:ModernProgrammingLanguageImplementa?on(Autumn2006)
g:=fh:=t–gIfe>0?
Building basic blocks – Identify leaders
– The first instruction in a procedure, or – The target of any branch, or – An instruction immediately following a branch (implicit target)
– Gobble all subsequent instructions until the next leader
![Page 20: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/20.jpg)
Basic Block Example
1 2
a := 0 b := a * b
3 L1: c := b/d
4 5 6
if c < x goto L2 e := b / c f := e + 1
7 L2: g := f
8 9
h := t - g if e > 0 goto L3
10 goto L1 11 L3: return
Leaders? – {1, 3, 5, 7, 10, 11}
Blocks? – {1, 2} – {3, 4} – {5, 6} – {7, 8, 9} – {10} – {11}
AdoptedFromUPennCIS570:ModernProgrammingLanguageImplementa?on(Autumn2006)
![Page 21: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/21.jpg)
Building a CFG From Basic Block
a := 0 b := a * b
e := b / c f : e + 1
g:=fh:=t–gIfe>0?
goto return
c := b / d c < x?
1
3
5
7
1110
Yes No
Construc=on– EachCFGnoderepresentsabasicblock– Thereisanedgefromnodeitojif– Laststatementofblockibranchestothefirststatementofj,or– Blockidoesnotendwithanuncondi?onalbranchandisimmediatelyfollowedinprogramorderbyblockj(fallthrough)
AdoptedFromUPennCIS570:ModernProgrammingLanguageImplementa?on(Autumn2006)
![Page 22: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/22.jpg)
Looping
preheader
head
tail exitedge
Exitedge
backedge
entryedge
Loop
AdoptedFromUPennCIS570:ModernProgrammingLanguageImplementa?on(Autumn2006)
Why?backedgesindicatethatwemightneedtotraversetheCFGmorethanoncefordataflowanalysis
![Page 23: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/23.jpg)
Looping
preheader
head
tail exitedge
Exitedge
backedge
entryedge
Loop
Notallloopshavepreheaders–Some?mesitisusefultocreatethem
Withoutpreheadernode–Therecanbemul?pleentryedges
Withsinglepreheadernode–Thereisonlyoneentryedge
AdoptedFromUPennCIS570:ModernProgrammingLanguageImplementa?on(Autumn2006)
![Page 24: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/24.jpg)
Dominators
• d dom i if all paths from entry to node i include d
• Strict Dominator (d sdom i) • If d dom i, but d != i
• Immediate dominator (a idom b) • a sdom b and there does not exist any node c such that a != c, c != b, a dom c,
c dom b
• Post dominator (p pdom i) • If every possible path from i to exit includes p
AdoptedFromUPennCIS570:ModernProgrammingLanguageImplementa?on(Autumn2006)
![Page 25: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/25.jpg)
IdenBfying Natural Loops and Dominators
• BackEdge• A back edge of a natural loop is one whose target dominates its source
• NaturalLoop• The natural loop of a back edge (m→n), where n dominates m, is the set of
nodes x such that n dominates x and there is a path from x to m not containing n
AdoptedFromUPennCIS570:ModernProgrammingLanguageImplementa?on(Autumn2006)
![Page 26: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/26.jpg)
Reducibility • ACFGisreducible(well-structured)ifwecanpar??onitsedgesintotwodisjointsets,theforwardedgesandthebackedges,suchthat – Theforwardedgesformanacyclicgraphinwhicheverynodecanbereachedfromtheentrynode– Thebackedgesconsistonlyofedgeswhosetargetsdominatetheirsources
• Structuredcontrol-flowconstructsgiverisetoreducibleCFGsValueofreducibility:– Dominanceusefuliniden?fyingloops– Simplifiescodetransforma?ons(everyloophasasingleheader)– Permits interval analysis
AdoptedFromUPennCIS570:ModernProgrammingLanguageImplementa?on(Autumn2006)
![Page 27: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/27.jpg)
Handling Irreducible CFG’s
• Nodesplivng• CanturnirreducibleCFGsintoreducibleCFGs
a
b
c d
e
b
c
a
d
dʹ e
Generalidea– Reducegraph(itera?velyremoveselfedges,mergenodeswithsinglepred)– Morethanonenode=>irreducible–Splitanymul?-parentnodeandstartover
AdoptedFromUPennCIS570:ModernProgrammingLanguageImplementa?on(Autumn2006)
![Page 28: Basic Program Analysis - Columbia UniversityBasic Program Analysis Suman Jana *some slides are borrowed from Baishakhi Ray and Ras Bodik Our Goal Program Analyzer Source code Security](https://reader036.vdocuments.us/reader036/viewer/2022071216/6047e7f12b71314bf0116da9/html5/thumbnails/28.jpg)
Why go through all this trouble? • Modernlanguagesprovidestructuredcontrolflow– Shouldn’tthecompilerrememberthisinforma?onratherthanthrowitawayandthenre-computeit?
• Answers?– Wemaywanttoworkonthebinarycode– Mostmodernlanguagess?llprovideagotostatement– Languages typically providemul?ple types of loops. This analysis letsus treatthemalluniformly– Wemaywantacompilerwithmul?plefrontendsformul?plelanguages;ratherthantransla?ngeachlanguagetoaCFG,translateeachlanguagetoacanonicalIRandthentoaCFG
AdoptedFromUPennCIS570:ModernProgrammingLanguageImplementa?on(Autumn2006)