scalable program comprehension for analyzing complex ...class.ece.iastate.edu/kothari/invited...
TRANSCRIPT
-
June 10, 2008 Copyright © 2008 S.C. Kothari All rights reserved 1
Scalable Program Comprehension for Scalable Program Comprehension for Analyzing Complex DefectsAnalyzing Complex Defects
ICPC 08 PresentationICPC 08 Presentation
S. C. KothariS. C. KothariIowa State University & EnSoft Corp.Iowa State University & EnSoft Corp.
Email: Email: [email protected]@iastate.edu
19081908 20022002
Working Working
TogetherTogether
mailto:[email protected]
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 22
Operating System/360 Operating System/360 The notThe not--unexpected passing away of OS/360 in its unexpected passing away of OS/360 in its 21st release 21st release –– August 2, 1972.August 2, 1972.Obituary:Obituary:
The offspring first saw the light of day in December 1965 The offspring first saw the light of day in December 1965 and the birth announcement recorded a weight of 64K. It and the birth announcement recorded a weight of 64K. It rapidly became apparent that OS, in spite of its unusual rapidly became apparent that OS, in spite of its unusual size, was more than normally subject to childhood diseases. size, was more than normally subject to childhood diseases. For a long period, this weak and sickly baby hovered close For a long period, this weak and sickly baby hovered close to death despite almost continuous transformations and to death despite almost continuous transformations and major transplants of several vital organs. Many experts are major transplants of several vital organs. Many experts are of the opinion that the huge weight of OS at birth of the opinion that the huge weight of OS at birth contributed greatly to its early ill health. OS is survived by contributed greatly to its early ill health. OS is survived by two lineal descendants, OS/VS1 and OS/VS2. It will be two lineal descendants, OS/VS1 and OS/VS2. It will be mourned by its many friends and particularly by the over mourned by its many friends and particularly by the over 10,000 system programmers throughout the world who owe 10,000 system programmers throughout the world who owe their jobs to its existence. their jobs to its existence.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 33
Millions of Lines of Code in Windows
0
10
20
30
40
50
60
3.1 3.5 3.51 4.0 2000 XP Vista
Windows Operating SystemWindows Operating System
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 44
A Classroom ExperienceA Classroom Experience
In an operating system course project, a student spends:• 40 hours in identifying and
understanding the relevant parts of code.
• 2 hours in making the actual code changes to incorporate the specified functionality.
• 10 hours in testing and debugging the code.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 55
Software Reliability Software Reliability –– Huge ProblemHuge Problem
Infamous Ariane 5 disaster, caused by a bug in the rocket’s control software.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 66
Problem Solving with PC ToolsProblem Solving with PC Tools
Problem: A clear definition including Problem: A clear definition including the variations.the variations.Solution:Solution:•• Estimating the work and the costEstimating the work and the cost•• Ease of applicabilityEase of applicability•• Scalability to large softwareScalability to large software•• Differentiating factorsDifferentiating factors
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 77
Matching Pair (MP) Defects Matching Pair (MP) Defects
Defect Defect -- if certain if certain program artifactsprogram artifactsare not in matching pairs.are not in matching pairs.A wide array of MP defects: nonA wide array of MP defects: non--matching parentheses, memory matching parentheses, memory leaks, synchronization problems etc.leaks, synchronization problems etc.Different levels of comprehension Different levels of comprehension complexity.complexity.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 88
Levels of ComplexityLevels of Complexity
Four Levels:Four Levels:•• Level I involves knowledge of syntaxLevel I involves knowledge of syntax•• Level II involves knowledge of control Level II involves knowledge of control
flowflow•• Level III involves knowledge of control Level III involves knowledge of control
flow and data flowflow and data flow•• Level IV involves knowledge of control Level IV involves knowledge of control
flow, data flow, and control transfer.flow, data flow, and control transfer.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 99
MPMP--1 Defects 1 Defects
Syntactic program artifacts such as Syntactic program artifacts such as parentheses. parentheses. Matching Constraint:Matching Constraint:•• LocalLocal: matching must be within a program : matching must be within a program
statement or a block. statement or a block. •• LIFOLIFO: Different types of artifacts must : Different types of artifacts must
individually match according to the individually match according to the LIFOLIFOproperty.property.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1010
MPMP--2 Defects2 Defects
Matching involves control flow. Matching involves control flow. Matching must happen on all Matching must happen on all feasible feasible execution pathsexecution paths..Example: matching pairs of functions Example: matching pairs of functions to to disabledisable and and enableenable interrupts. interrupts.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1111
Feasible vs. nonFeasible vs. non--feasible pathsfeasible pathsRead (X);
A = B = C =5;
If (X > 5)
enable();
If (X == 0)
A = B + C;
If (X > 5)
disable();
Print (A,B,C);
Note that enable() and disable() match on all feasible control paths. There are infeasible control paths on which they do not match.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1212
MPMP--3 Defects3 Defects
Matching Constraint: Matching Constraint: •• Involves Involves control flowcontrol flow and and data flowdata flow. . •• Matching must happen on all Matching must happen on all feasible feasible
execution pathsexecution paths..•• Matching involves data elements.Matching involves data elements.
Example: memory leaksExample: memory leaks•• Allocation not matched by deallocation.Allocation not matched by deallocation.•• Matching requires allocation and deallocation Matching requires allocation and deallocation
to have pointers to the same memory location.to have pointers to the same memory location.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1313
MPMP--4 Defects4 Defects
Similar to MPSimilar to MP--3, but complicated by 3, but complicated by concurrent and interrupt processing.concurrent and interrupt processing.Implication Implication -- the feasible execution paths the feasible execution paths may not be directly linked by control flow.may not be directly linked by control flow.Example: memory leaksExample: memory leaks•• Allocation may be matched by deallocation Allocation may be matched by deallocation
across a different thread or deallocation done by across a different thread or deallocation done by an interrupt service routine.an interrupt service routine.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1414
Shared pool of tokens
D1
T2
f -1
A different execution thread
E1
S2
E3
E2
f
f -1
Matching Pairs
Watch a scenario that makes the second path not defective.
Watch a second path –appears to be defective.
Watch one non-defective path.
Matching Pairs
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1515
Knowledge-Centric Software (KCS) tools technology
… Work at EnSoft and ISU
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1616
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1717
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1818
Tools to amplify human intelligenceTools to amplify human intelligence
Fredrick Brooks:Fredrick Brooks:“... IA > AI, that is, that intelligence amplifying systems can, at any given level of available systems technology, beat AI systems. That is, a machine and a mind can beat a mind-imitating machine working by itself.”
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1919
QueryQuery--ModelModel--Refine (QMR) Refine (QMR) TechniqueTechnique
A natural way to amplify human intelligence by assisting in: •Retrieval of information by analyzing software.
•Generation of visual models from the retrieved information.
•Refinement of the models to manage complexity.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2020
A DemonstrationA Demonstration
Defect analysis using the QMR Defect analysis using the QMR technique to program technique to program comprehension.comprehension.We will show how to analyze the We will show how to analyze the Linux operating systems for MP Linux operating systems for MP defects.defects.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2121
A Defect Analysis Problem A Defect Analysis Problem
Problem: analyze the Linux 2.6 Problem: analyze the Linux 2.6 kernel for MP defects w.r.t. kernel for MP defects w.r.t. mutex_lockmutex_lock and and mutex_unlockmutex_unlockfunctions.functions.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2222
IA ApproachIA Approach
•• Define a comprehension strategy to Define a comprehension strategy to solve the problem. solve the problem.
•• Use the tool to execute the strategy.Use the tool to execute the strategy.•• Implication: To design useful tools, prior Implication: To design useful tools, prior
understanding of problems and solution understanding of problems and solution strategies is important. strategies is important.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2323
A Defect Analysis SolutionA Defect Analysis Solution
Design a sequence of solution steps.Design a sequence of solution steps.Define the query, model, or the Define the query, model, or the refinement to be done at each step.refinement to be done at each step.Quantify the work in an early stage Quantify the work in an early stage of the solution process.of the solution process.Argue that all the possible cases are Argue that all the possible cases are handled.handled.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2424
Step 1Step 1Divide the problem in two cases:Divide the problem in two cases:•• Case 1: the Case 1: the locklock and and unlockunlock are called are called
within the same function.within the same function.•• Case 2: the Case 2: the locklock and and unlockunlock are not are not
called within the same function.called within the same function.
We will follow up case 2.We will follow up case 2.Execute queries to obtain a list of Execute queries to obtain a list of functions that call functions that call locklock but but notnotunlockunlock..
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2525
Result 1Result 1
LOCKUNLOCK
401 functions that call lock
436 functions that call unlock
51 functions call unlock but not lock
16 functions call lock but not unlock
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2626
A function calls lock but not unlockA function calls lock but not unlock
static void *diskstats_start(struct seq_file *part, loff_t *pos){loff_t k = *pos;struct list_head *p;
mutex_lock(&block_subsys_lock);list_for_each(p, &block_subsys.kset.list)
if (!k--)return list_entry(p, struct gendisk, kobj.entry);
return NULL;}
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2727
Checking a PossibilityChecking a PossibilityAnalyze these 16 functions that call Analyze these 16 functions that call lock but not unlock.lock but not unlock.For such a function For such a function gg check the check the possibility: an ancestor possibility: an ancestor ff of of gg calls calls unlock (directly or indirectly).unlock (directly or indirectly).
ffhh
gg LOCK
UNLOCK
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2828
What should be the query to find What should be the query to find the ancestor?the ancestor?
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2929
f1f1hh
gg LOCK
UNLOCK
f2f2
f3f3hh UNLOCK
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3030
Step 2Step 2
Execute a query to find the Execute a query to find the rootsroots of of reverse call graphreverse call graph with the given 16 with the given 16 functions as leaves. functions as leaves.
16 functions that call lock but not unlock
Separate threads or interrupt service routines
183 roots
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3131
f1f1hh
gg LOCK
UNLOCK
f2f2
f3f3hh UNLOCK
UNLOCK
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3232
Step 3Step 3Partition the roots into groups of Partition the roots into groups of related functions.related functions.
16 functions that call lock but not unlock
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3333
Step 4Step 4Model: Call tree with a selected group of Model: Call tree with a selected group of roots and lock and unlock as leaves.roots and lock and unlock as leaves.For demonstration, we will select:For demonstration, we will select:•• idecd_ioctlidecd_ioctl
•• idedisk_ioctlidedisk_ioctl
Note that the ancestorNote that the ancestor f f where the lock where the lock and unlock can be matched, if it exists, and unlock can be matched, if it exists, can be found through the above model. can be found through the above model.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3434
Result Result Represents two cases which can be analyzed separately
87 nodes
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3535
Step 5Step 5Refine the Model: Omit one case at a Refine the Model: Omit one case at a time. time. This refinement is achieved by a This refinement is achieved by a graph transformation provided by graph transformation provided by Atlas.Atlas.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3636
ResultResultThis is not an ancestor of the function gg
62 nodes
Lock
The function g g that calls LOCK but not unlock.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3737
Step 6Step 6Further Refinement: Omit the part Further Refinement: Omit the part which is not an ancestor of which is not an ancestor of g. . This refinement is achieved by a This refinement is achieved by a graph transformation provided by graph transformation provided by Atlas.Atlas.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3838
ResultResult
The ancestor where the lock and unlock call can be matched.
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3939
Questions ??Questions ??
Would like to discuss:Would like to discuss:•• Program comprehension toolsProgram comprehension tools•• Program comprehension problems and Program comprehension problems and
solution strategiessolution strategies•• Query languagesQuery languages•• Use of graph theory in program Use of graph theory in program
comprehensioncomprehension
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 4040
Atlas Atlas –– A Program Mapping ToolA Program Mapping Tool
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 4141
Total Insight Total Insight –– A COBOL ToolA COBOL Tool
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 4242
SimDiff SimDiff –– A Model Differencing ToolA Model Differencing Tool
-
June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 4343
ReferencesReferencesFredrick Brooks, Fredrick Brooks, TheThe Computer Scientist as Computer Scientist as ToolsmithToolsmith,,•• http://www.cs.unc.edu/~brooks/Toolsmithhttp://www.cs.unc.edu/~brooks/Toolsmith--CACM.pdfCACM.pdf
Earliest paper on software complexity: Earliest paper on software complexity: •• Rubey, R.J.; Hartwick, R.D.: Quantitative Measurement Rubey, R.J.; Hartwick, R.D.: Quantitative Measurement
Program Quality. ACM, National Computer Conference pp. Program Quality. ACM, National Computer Conference pp. 671671--677, 1968.677, 1968.
KnowledgeKnowledge--Centric Software Research Laboratory at Centric Software Research Laboratory at ISUISU•• http://dirac.ece.iastate.edu/sec/http://dirac.ece.iastate.edu/sec/
EnSoft Corp. EnSoft Corp. •• http://www.ensoftcorp.com/http://www.ensoftcorp.com/
http://www.cs.unc.edu/~brooks/Toolsmith-CACM.pdfhttp://dirac.ece.iastate.edu/sec/http://www.ensoftcorp.com/
Scalable Program Comprehension for Analyzing Complex Defects��ICPC 08 PresentationOperating System/360 Windows Operating SystemA Classroom ExperienceSoftware Reliability – Huge ProblemProblem Solving with PC ToolsMatching Pair (MP) Defects Levels of ComplexityMP-1 Defects MP-2 DefectsFeasible vs. non-feasible pathsMP-3 DefectsMP-4 DefectsTools to amplify human intelligenceQuery-Model-Refine (QMR) TechniqueA DemonstrationA Defect Analysis Problem IA ApproachA Defect Analysis SolutionStep 1Result 1A function calls lock but not unlockChecking a PossibilityWhat should be the query to find the ancestor? Step 2 Step 3 Step 4Result Step 5Result Step 6ResultQuestions ??Atlas – A Program Mapping ToolTotal Insight – A COBOL ToolSimDiff – A Model Differencing ToolReferences