scalable program comprehension for analyzing complex ...class.ece.iastate.edu/kothari/invited...

43
June 10, 2008 Copyright © 2008 S.C. Kothari All rights reserved 1 Scalable Program Comprehension for Scalable Program Comprehension for Analyzing Complex Defects Analyzing Complex Defects ICPC 08 Presentation ICPC 08 Presentation S. C. Kothari S. C. Kothari Iowa State University & EnSoft Corp. Iowa State University & EnSoft Corp. Email: Email: [email protected] [email protected] 1908 1908 2002 2002 Working Working Together Together

Upload: others

Post on 20-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • June 10, 2008 Copyright © 2008 S.C. Kothari All rights reserved 1

    Scalable Program Comprehension for Scalable Program Comprehension for Analyzing Complex DefectsAnalyzing Complex Defects

    ICPC 08 PresentationICPC 08 Presentation

    S. C. KothariS. C. KothariIowa State University & EnSoft Corp.Iowa State University & EnSoft Corp.

    Email: Email: [email protected]@iastate.edu

    19081908 20022002

    Working Working

    TogetherTogether

    mailto:[email protected]

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 22

    Operating System/360 Operating System/360 The notThe not--unexpected passing away of OS/360 in its unexpected passing away of OS/360 in its 21st release 21st release –– August 2, 1972.August 2, 1972.Obituary:Obituary:

    The offspring first saw the light of day in December 1965 The offspring first saw the light of day in December 1965 and the birth announcement recorded a weight of 64K. It and the birth announcement recorded a weight of 64K. It rapidly became apparent that OS, in spite of its unusual rapidly became apparent that OS, in spite of its unusual size, was more than normally subject to childhood diseases. size, was more than normally subject to childhood diseases. For a long period, this weak and sickly baby hovered close For a long period, this weak and sickly baby hovered close to death despite almost continuous transformations and to death despite almost continuous transformations and major transplants of several vital organs. Many experts are major transplants of several vital organs. Many experts are of the opinion that the huge weight of OS at birth of the opinion that the huge weight of OS at birth contributed greatly to its early ill health. OS is survived by contributed greatly to its early ill health. OS is survived by two lineal descendants, OS/VS1 and OS/VS2. It will be two lineal descendants, OS/VS1 and OS/VS2. It will be mourned by its many friends and particularly by the over mourned by its many friends and particularly by the over 10,000 system programmers throughout the world who owe 10,000 system programmers throughout the world who owe their jobs to its existence. their jobs to its existence.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 33

    Millions of Lines of Code in Windows

    0

    10

    20

    30

    40

    50

    60

    3.1 3.5 3.51 4.0 2000 XP Vista

    Windows Operating SystemWindows Operating System

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 44

    A Classroom ExperienceA Classroom Experience

    In an operating system course project, a student spends:• 40 hours in identifying and

    understanding the relevant parts of code.

    • 2 hours in making the actual code changes to incorporate the specified functionality.

    • 10 hours in testing and debugging the code.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 55

    Software Reliability Software Reliability –– Huge ProblemHuge Problem

    Infamous Ariane 5 disaster, caused by a bug in the rocket’s control software.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 66

    Problem Solving with PC ToolsProblem Solving with PC Tools

    Problem: A clear definition including Problem: A clear definition including the variations.the variations.Solution:Solution:•• Estimating the work and the costEstimating the work and the cost•• Ease of applicabilityEase of applicability•• Scalability to large softwareScalability to large software•• Differentiating factorsDifferentiating factors

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 77

    Matching Pair (MP) Defects Matching Pair (MP) Defects

    Defect Defect -- if certain if certain program artifactsprogram artifactsare not in matching pairs.are not in matching pairs.A wide array of MP defects: nonA wide array of MP defects: non--matching parentheses, memory matching parentheses, memory leaks, synchronization problems etc.leaks, synchronization problems etc.Different levels of comprehension Different levels of comprehension complexity.complexity.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 88

    Levels of ComplexityLevels of Complexity

    Four Levels:Four Levels:•• Level I involves knowledge of syntaxLevel I involves knowledge of syntax•• Level II involves knowledge of control Level II involves knowledge of control

    flowflow•• Level III involves knowledge of control Level III involves knowledge of control

    flow and data flowflow and data flow•• Level IV involves knowledge of control Level IV involves knowledge of control

    flow, data flow, and control transfer.flow, data flow, and control transfer.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 99

    MPMP--1 Defects 1 Defects

    Syntactic program artifacts such as Syntactic program artifacts such as parentheses. parentheses. Matching Constraint:Matching Constraint:•• LocalLocal: matching must be within a program : matching must be within a program

    statement or a block. statement or a block. •• LIFOLIFO: Different types of artifacts must : Different types of artifacts must

    individually match according to the individually match according to the LIFOLIFOproperty.property.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1010

    MPMP--2 Defects2 Defects

    Matching involves control flow. Matching involves control flow. Matching must happen on all Matching must happen on all feasible feasible execution pathsexecution paths..Example: matching pairs of functions Example: matching pairs of functions to to disabledisable and and enableenable interrupts. interrupts.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1111

    Feasible vs. nonFeasible vs. non--feasible pathsfeasible pathsRead (X);

    A = B = C =5;

    If (X > 5)

    enable();

    If (X == 0)

    A = B + C;

    If (X > 5)

    disable();

    Print (A,B,C);

    Note that enable() and disable() match on all feasible control paths. There are infeasible control paths on which they do not match.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1212

    MPMP--3 Defects3 Defects

    Matching Constraint: Matching Constraint: •• Involves Involves control flowcontrol flow and and data flowdata flow. . •• Matching must happen on all Matching must happen on all feasible feasible

    execution pathsexecution paths..•• Matching involves data elements.Matching involves data elements.

    Example: memory leaksExample: memory leaks•• Allocation not matched by deallocation.Allocation not matched by deallocation.•• Matching requires allocation and deallocation Matching requires allocation and deallocation

    to have pointers to the same memory location.to have pointers to the same memory location.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1313

    MPMP--4 Defects4 Defects

    Similar to MPSimilar to MP--3, but complicated by 3, but complicated by concurrent and interrupt processing.concurrent and interrupt processing.Implication Implication -- the feasible execution paths the feasible execution paths may not be directly linked by control flow.may not be directly linked by control flow.Example: memory leaksExample: memory leaks•• Allocation may be matched by deallocation Allocation may be matched by deallocation

    across a different thread or deallocation done by across a different thread or deallocation done by an interrupt service routine.an interrupt service routine.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1414

    Shared pool of tokens

    D1

    T2

    f -1

    A different execution thread

    E1

    S2

    E3

    E2

    f

    f -1

    Matching Pairs

    Watch a scenario that makes the second path not defective.

    Watch a second path –appears to be defective.

    Watch one non-defective path.

    Matching Pairs

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1515

    Knowledge-Centric Software (KCS) tools technology

    … Work at EnSoft and ISU

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1616

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1717

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1818

    Tools to amplify human intelligenceTools to amplify human intelligence

    Fredrick Brooks:Fredrick Brooks:“... IA > AI, that is, that intelligence amplifying systems can, at any given level of available systems technology, beat AI systems. That is, a machine and a mind can beat a mind-imitating machine working by itself.”

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 1919

    QueryQuery--ModelModel--Refine (QMR) Refine (QMR) TechniqueTechnique

    A natural way to amplify human intelligence by assisting in: •Retrieval of information by analyzing software.

    •Generation of visual models from the retrieved information.

    •Refinement of the models to manage complexity.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2020

    A DemonstrationA Demonstration

    Defect analysis using the QMR Defect analysis using the QMR technique to program technique to program comprehension.comprehension.We will show how to analyze the We will show how to analyze the Linux operating systems for MP Linux operating systems for MP defects.defects.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2121

    A Defect Analysis Problem A Defect Analysis Problem

    Problem: analyze the Linux 2.6 Problem: analyze the Linux 2.6 kernel for MP defects w.r.t. kernel for MP defects w.r.t. mutex_lockmutex_lock and and mutex_unlockmutex_unlockfunctions.functions.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2222

    IA ApproachIA Approach

    •• Define a comprehension strategy to Define a comprehension strategy to solve the problem. solve the problem.

    •• Use the tool to execute the strategy.Use the tool to execute the strategy.•• Implication: To design useful tools, prior Implication: To design useful tools, prior

    understanding of problems and solution understanding of problems and solution strategies is important. strategies is important.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2323

    A Defect Analysis SolutionA Defect Analysis Solution

    Design a sequence of solution steps.Design a sequence of solution steps.Define the query, model, or the Define the query, model, or the refinement to be done at each step.refinement to be done at each step.Quantify the work in an early stage Quantify the work in an early stage of the solution process.of the solution process.Argue that all the possible cases are Argue that all the possible cases are handled.handled.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2424

    Step 1Step 1Divide the problem in two cases:Divide the problem in two cases:•• Case 1: the Case 1: the locklock and and unlockunlock are called are called

    within the same function.within the same function.•• Case 2: the Case 2: the locklock and and unlockunlock are not are not

    called within the same function.called within the same function.

    We will follow up case 2.We will follow up case 2.Execute queries to obtain a list of Execute queries to obtain a list of functions that call functions that call locklock but but notnotunlockunlock..

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2525

    Result 1Result 1

    LOCKUNLOCK

    401 functions that call lock

    436 functions that call unlock

    51 functions call unlock but not lock

    16 functions call lock but not unlock

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2626

    A function calls lock but not unlockA function calls lock but not unlock

    static void *diskstats_start(struct seq_file *part, loff_t *pos){loff_t k = *pos;struct list_head *p;

    mutex_lock(&block_subsys_lock);list_for_each(p, &block_subsys.kset.list)

    if (!k--)return list_entry(p, struct gendisk, kobj.entry);

    return NULL;}

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2727

    Checking a PossibilityChecking a PossibilityAnalyze these 16 functions that call Analyze these 16 functions that call lock but not unlock.lock but not unlock.For such a function For such a function gg check the check the possibility: an ancestor possibility: an ancestor ff of of gg calls calls unlock (directly or indirectly).unlock (directly or indirectly).

    ffhh

    gg LOCK

    UNLOCK

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2828

    What should be the query to find What should be the query to find the ancestor?the ancestor?

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 2929

    f1f1hh

    gg LOCK

    UNLOCK

    f2f2

    f3f3hh UNLOCK

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3030

    Step 2Step 2

    Execute a query to find the Execute a query to find the rootsroots of of reverse call graphreverse call graph with the given 16 with the given 16 functions as leaves. functions as leaves.

    16 functions that call lock but not unlock

    Separate threads or interrupt service routines

    183 roots

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3131

    f1f1hh

    gg LOCK

    UNLOCK

    f2f2

    f3f3hh UNLOCK

    UNLOCK

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3232

    Step 3Step 3Partition the roots into groups of Partition the roots into groups of related functions.related functions.

    16 functions that call lock but not unlock

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3333

    Step 4Step 4Model: Call tree with a selected group of Model: Call tree with a selected group of roots and lock and unlock as leaves.roots and lock and unlock as leaves.For demonstration, we will select:For demonstration, we will select:•• idecd_ioctlidecd_ioctl

    •• idedisk_ioctlidedisk_ioctl

    Note that the ancestorNote that the ancestor f f where the lock where the lock and unlock can be matched, if it exists, and unlock can be matched, if it exists, can be found through the above model. can be found through the above model.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3434

    Result Result Represents two cases which can be analyzed separately

    87 nodes

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3535

    Step 5Step 5Refine the Model: Omit one case at a Refine the Model: Omit one case at a time. time. This refinement is achieved by a This refinement is achieved by a graph transformation provided by graph transformation provided by Atlas.Atlas.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3636

    ResultResultThis is not an ancestor of the function gg

    62 nodes

    Lock

    The function g g that calls LOCK but not unlock.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3737

    Step 6Step 6Further Refinement: Omit the part Further Refinement: Omit the part which is not an ancestor of which is not an ancestor of g. . This refinement is achieved by a This refinement is achieved by a graph transformation provided by graph transformation provided by Atlas.Atlas.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3838

    ResultResult

    The ancestor where the lock and unlock call can be matched.

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 3939

    Questions ??Questions ??

    Would like to discuss:Would like to discuss:•• Program comprehension toolsProgram comprehension tools•• Program comprehension problems and Program comprehension problems and

    solution strategiessolution strategies•• Query languagesQuery languages•• Use of graph theory in program Use of graph theory in program

    comprehensioncomprehension

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 4040

    Atlas Atlas –– A Program Mapping ToolA Program Mapping Tool

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 4141

    Total Insight Total Insight –– A COBOL ToolA COBOL Tool

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 4242

    SimDiff SimDiff –– A Model Differencing ToolA Model Differencing Tool

  • June 10, 2008June 10, 2008 Copyright Copyright ©© 2008 S.C. Kothari All rights reserved2008 S.C. Kothari All rights reserved 4343

    ReferencesReferencesFredrick Brooks, Fredrick Brooks, TheThe Computer Scientist as Computer Scientist as ToolsmithToolsmith,,•• http://www.cs.unc.edu/~brooks/Toolsmithhttp://www.cs.unc.edu/~brooks/Toolsmith--CACM.pdfCACM.pdf

    Earliest paper on software complexity: Earliest paper on software complexity: •• Rubey, R.J.; Hartwick, R.D.: Quantitative Measurement Rubey, R.J.; Hartwick, R.D.: Quantitative Measurement

    Program Quality. ACM, National Computer Conference pp. Program Quality. ACM, National Computer Conference pp. 671671--677, 1968.677, 1968.

    KnowledgeKnowledge--Centric Software Research Laboratory at Centric Software Research Laboratory at ISUISU•• http://dirac.ece.iastate.edu/sec/http://dirac.ece.iastate.edu/sec/

    EnSoft Corp. EnSoft Corp. •• http://www.ensoftcorp.com/http://www.ensoftcorp.com/

    http://www.cs.unc.edu/~brooks/Toolsmith-CACM.pdfhttp://dirac.ece.iastate.edu/sec/http://www.ensoftcorp.com/

    Scalable Program Comprehension for Analyzing Complex Defects��ICPC 08 PresentationOperating System/360 Windows Operating SystemA Classroom ExperienceSoftware Reliability – Huge ProblemProblem Solving with PC ToolsMatching Pair (MP) Defects Levels of ComplexityMP-1 Defects MP-2 DefectsFeasible vs. non-feasible pathsMP-3 DefectsMP-4 DefectsTools to amplify human intelligenceQuery-Model-Refine (QMR) TechniqueA DemonstrationA Defect Analysis Problem IA ApproachA Defect Analysis SolutionStep 1Result 1A function calls lock but not unlockChecking a PossibilityWhat should be the query to find the ancestor? Step 2 Step 3 Step 4Result Step 5Result Step 6ResultQuestions ??Atlas – A Program Mapping ToolTotal Insight – A COBOL ToolSimDiff – A Model Differencing ToolReferences