empirical design and analysis of a defect taxonomy for novice programmers · additionally novice...
TRANSCRIPT
- 1 -
Empirical Design and Analysis of a Defect
Taxonomy for Novice Programmers
Lu Zhang
This thesis is presented
For the degree of Master of Science of
The University of Western Australia
School of Computer Science & Software Engineering
2012
2
Abstract
Students in first-year computer science at many universities are required to enroll
in an introductory programming course to learn Java. Programming defects
provide useful information revealing the challenges students face in achieving
high quality code. The identification of defects also enables instructors to improve
their teaching by placing greater emphasis on those areas where students are
struggling.
In this dissertation, a range of defect types has been identified and a taxonomy -
called Novice Defect Taxonomy (NDT) – developed. This taxonomy may be used
to hierarchically classify defects in a clear and reproducible way. Its derivation
from a large number of student assignments is described. Assignments are
assessed within a defect measurement framework which combines dynamic and
static analysis. The approach measures defects in functionality, code style,
language syntax and code completeness. Based on the analyses, it is shown that
automatic assistance has a positive impact on the program quality of novice
programmers. Students rapidly accept automatic tools. Finally, this taxonomy
provides other researchers with a framework and reference baseline for
developing new defect classifications.
3
Acknowledgements
Firstly, I wish to thank my coordinating supervisor professor Rachel
Cardell-Oliver who convinced me to take the research opportunity and introduced
me to the research area. In addition, I would thank Rachel for her time giving and
extraordinary support.
I would thank my co-supervisor Terry Woodings for the useful discussion and
suggestion about the data analysis. I would thank Rachel and Terry for the
valuable advice encouragement and endless help throughout my study and fast
feedback on my writing.
I will express my thanks to CSSE computer system administrators Askley
Chew and Laurie McKeaing for their timely support.
Additional thanks goes to all my research fellows for sharing research
experience and provide help for learning research skills.
Lastly, I would express my thanks to my family for their endless support in the
last two years when I stay in Perth.
4
Publication
This dissertation contains some results from the publication:
R, Cardell–Oliver, L, Zhang, R, Barady, RH, Lim, A, Naveed, & T,
Woodings, ”Automated Feedback for Quality Assurance in Software Engineering
Education”, in Proceeding of the 2010 21st Australian Software Engineering
Conference, pp. 157-164, 2010.
The author of this dissertation is the second author of the publication above.
She takes the responsibility for the part on the results of dynamic testing.
5
Table of Contents
Abstract .............................................................................................................. 2
Acknowledgements ............................................................................................ 3
Publication ......................................................................................................... 4
Table of Contents ............................................................................................... 5
List of Figures .................................................................................................... 7
List of Tables ...................................................................................................... 8
Chapter 1 Introduction .................................................................................... 10
1.1 Motivation ............................................................................................... 10
1.2 Challenges ............................................................................................... 12
1.3 Approach and Scope ................................................................................. 13
1.4 Research Questions .................................................................................. 16
1.5 Contribution ............................................................................................. 18
1.6 Thesis Outline .......................................................................................... 19
Chapter 2 Literature Review ........................................................................... 20
2.1 Specifications of Defects .......................................................................... 20
2.1.1 What is a Software Defect?................................................................ 20
2.1.2 Defect Taxonomies ............................................................................ 22
2.1.2.1 Novice VS Experts ..................................................................... 22
2.1.2.2 Qualitative Analysis VS Quantitative Analysis of Defects ........... 24
2.1.2.3 Object Oriented Programming Languages VS Procedural
Languages .............................................................................................. 25
2.2 Measurement of Defects ........................................................................... 26
2.2.1 Automatic Assessment ....................................................................... 26
2.2.2 Code Inspection ................................................................................. 27
2.3 Categories of Defects ............................................................................... 27
2.4 Research Gaps.......................................................................................... 28
Chapter 3 Data Collection for the Defect Taxonomy ...................................... 30
3.1 Subject and Exercise Choice .................................................................... 30
3.1.1 Subject Choice .................................................................................. 30
3.1.2 Exercise Choice ................................................................................. 31
6
3.2 Data Collection Mechanism ...................................................................... 33
3.3 Defect Measurement ................................................................................. 35
3.3.1 Software Attributes for Measuring ..................................................... 35
3.3.2 Defect Counting Approaches ............................................................. 36
3.3.3 Defect Detection Framework ............................................................. 37
3.3.4 Compilation Detection ....................................................................... 39
3.3.5 Evolvability Fault Detection .............................................................. 39
3.3.6 Functional Correctness Detection....................................................... 40
3.3.7 Code Inspection ................................................................................. 48
3.4 Measurement Instruments ......................................................................... 51
3.4.1 Integrated Development Environments .............................................. 51
3.4.2 JUnit .................................................................................................. 51
3.4.3 Checkstyle ......................................................................................... 52
3.4.4 PMD .................................................................................................. 52
3.5 Comparison of Static Analysis Tools......................................................... 53
3.6 Static Analysis Tools in Practice ............................................................... 54
3.7 Measurement Risks .................................................................................. 55
Chapter 4 Novice Defect Taxonomy Specification .......................................... 57
4.1 Novice Defect Taxonomy ......................................................................... 57
4.2 Defect Specification ................................................................................. 59
4.3 Summary .................................................................................................. 75
Chapter 5 Analysis Using the Novice Defect Taxonomy ................................. 76
5.1 Comparison of NDT Defect Categories with Other Defect Taxonomies .... 77
5.2 Quantitative Analysis of Defects ............................................................... 79
5.3 Defect Patterns and the Difficulty of Exercises ......................................... 80
5.4 Using NDT to Analyze the Impact of Automatic Feedback ....................... 82
5.5 Using NDT to Improve Teaching Strategy ................................................ 87
5.6 Summary .................................................................................................. 90
Chapter 6 Conclusion....................................................................................... 91
6.1 Contribution ............................................................................................. 91
6.2 Future Work ............................................................................................. 91
6.3 Conclusion ............................................................................................... 92
7
List of Figures
Figure 1.1. Research Areas Addressed in this Study …………………………….14
Figure 3.1. Recommended Process for Completing a Programming Assignment 34
Figure 3.2. A Summary of Measurement Validation Concepts ............................ 36
Figure 3.3. An Overview of a Defect Measurement Process ............................... 38
Figure 3.4. The TextAnalyser Assignment .......................................................... 41
Figure 3.5. Test Cases for frequencyOf() ............................................................ 43
Figure 3.6. An Example of Buggy Fragment of Class TextAnalyser ................... 44
Figure 3.7. Detect Coverage of Static Analysis Tools ......................................... 53
Figure 3.8. A Solution of Class CustomersList ................................................... 54
Figure 4.1. Novice Defect Taxonomy ................................................................ 58
Figure 4.2. Levels 1 and 2 of the Novice Defect Taxonomy ............................... 59
Figure 4.3. CANNOT COMPILE Class .............................................................. 60
Figure 4.4. FUNCTIONAL DEFECT Taxonomy ................................................ 64
Figure 4.5. EVOLVABILITY DEFECT Taxonomy .............................................. 70
8
List of Tables
Table 2. 1. Comparison of Defect Taxonomies (Main Categories) ..................... 28
Table 3. 1. Student Experience and Laboratory Support for Different Cohorts .. 31
Table 3. 2. Sizes of Java Assignments ............................................................... 32
Table 3. 3. Java Language Constructs used in Assignments ............................... 32
Table 3. 4. Cohort Size and Submissions for Each Assignment ......................... 33
Table 3. 5. Metrics for Evolvability Fault Detection .......................................... 39
Table 3. 6. Test Case Failures and Relevant Defects in Program ........................ 48
Table 3. 7. Functional Property Inspection Form ............................................... 49
Table 3. 8. Functional Defect Count Checklist .................................................. 50
Table 5. 1. Top Ten Defects from the UWA Data Set ......................................... 80
Table 5. 2. FUNCTION MISSING (D2.1.1.1) Defects for Labs at Different
Complexity Levels ........................................................................................... 81
Table 5. 3. Distribution of Novice Defects ........................................................ 83
Table 5. 4. Error Information (TDC) of Lab B1 and D1on the Basis of
Sub-classes of FUNCTIONAL DEFECT ....................................................... 84
Table 5. 5. Error Information (CDC) of Lab B1 and D1 on the Basis of
Sub-classes of FUNCTIONAL DEFECT ....................................................... 84
Table 5. 6. Error Information (TDC) of Lab B1 and D1 on the Basis of Bottom
Level Classes of FUNCTIONAL DEFECT ................................................... 85
Table 5. 7. Error Information (CDC) of Lab B1 and D1 on the Basis of Bottom
Level Classes of FUNCTIONAL DEFECT ................................................... 85
Table 5. 8. Error Information (TDC) of Lab B1 and D1 on the Basis of
Sub-classes of EVOLVABILITY DEFECT .................................................... 86
Table 5. 9. Error Information (CDC) of Lab B1 and D1 on the Basis of
Sub-classes of EVOLVABILITY DEFECT .................................................... 86
Table 5. 10. Error Information (TDC) of Lab B1 and D1 on the Basis of Bottom
Level Defect Types of EVOLVABILITY DEFECT ........................................ 87
Table 5. 11. Error Information (CDC) of Lab B1 and D1 on the Basis of Bottom
9
Level Defect Types of EVOLVABILITY DEFECT ....................................... 87
Table 5. 12. Common Programming Problems, Teaching Strategies and Solutions
......................................................................................................................... 88
Table 5. 13. Top Defect Class, Novice Problems Underlying & Solution in
Teaching Strategies ....................................................................................... 89
10
Chapter 1 Introduction
Students in first-year computer science at the many universities are required to
enroll in an introductory programming course to learn Java. Programming defects
provide useful information revealing the challenges students face in achieving
high quality code. The identification of defects also enables instructors to improve
their teaching by placing greater emphasis on those areas where students are
struggling. In this dissertation, a range of defect types covering functionality
defect, code style defect, syntactic defect and code completeness defect have been
identified and a taxonomy called Novice Defect Taxonomy (NDT) has been
developed.
In this chapter, the motivation for our research is discussed in Section 1.1. Then,
the challenges are summarized in Section 1.2 followed by the approaches taken
and the scope of the study in Section 1.3. Section 1.4 outlines eight research
questions addressed in this dissertation. Section 1.5 summarizes the contributions
of this study.
1.1 Motivation
Writing error-free programs is not easy for students from the very beginning no
matter how simple the tasks are. Various aspects may affect novices‟ learning
outcomes: personal characteristics, personal learning strategies, and prior
knowledge and practices. Personal characteristics such as general intelligence and
mathematics background also seem to affect the success of learning to program
(Ala-Mutka 2004). Personal learning strategies affect novices‟ success in learning
11
programming strategies (Ala-Mutka 2004; Robins, Rountree & Rountree 2003).
Additionally novice difficulties are associated with understanding the abstract
nature of programming (Ala-Mutka 2004; Lahtinen, Ala-Mutka & Jarvinen 2005;
Robins, Rountree & Rountree 2003). Students often believe they understand
concepts in programming but they still fail to use them properly (Ala-Mutka
2004). Knowing students‟ difficulties provides an instructor with a chance to
understand their misconceptions in programming. Spohrer & Soloway (1986)
argued „the more we know about what students know, the better we can teach
them.‟ Empirical findings obtained from analyzing a large number of student
submissions enable the instructor to place greater emphasis on student problems
and thereby tailor their curriculum accordingly.
A Software Defect is an imperfection that produces a departure of system from
its required behavior (IEEE, 2010). A Defect Taxonomy is a system of hierarchical
categories for classifying the defects found in programs. The defect taxonomy is
organized by both low-level and high-level categories. Many studies have
investigated the types of defects in software and many high categories for
arranging the quantitative data have been reported (Ahmadzadeh, Elliman &
Higgins 2005; Hristova et al. 2003). However, providing high level categories
only without any lower level sub-categories failed to match the defects found in
student programs (Chillarege, Kao & Condit 1991). The defects made by students
usually involve lower level cases. The hierarchical defect taxonomy introduced in
this dissertation is used to classify defects in students‟ programming assignments
at four different levels of abstraction. The focus in our work is on identifying
defects rather than assessing their causes.
Researchers have analyzed a wide span of defect possibilities covering
functional, style, efficiency, performance, logic, syntax and completeness defect in
programs (Ahmadzadeh, Elliman & Higgins 2005; Chillarege et al. 1992;
Chillarege, Kao & Condit 1991; Coull et al. 2003; Hristova et al.2003; Jackson,
Cobb & Carver 2004; Kopec, Yarmish & Cheung 2007; Mantyla & Lassenius
2009; IEEE 2010). This study will analyse the defect types of functionality,
programming style, syntax and completeness in novices‟ programs and develop a
well-designed defect taxonomy by labeling defects detected from their code. A
well-designed defect taxonomy developed hierarchically could help instructors
indentify common defects made by novice programmers. We believe that
12
identifying defects in student programs can help students focus on their problems
and make necessary corrections. Furthermore, analysis using this taxonomy shows
instructors what areas challenged beginners most. Analysis using the taxonomy
additionally supports the improvement of teaching strategies by enabling
instructors to compare the code that a cohort of students produces when given
different teaching interventions.
1.2 Challenges
Detecting and categorizing defects in students‟ programs is a challenging problem.
Programming assignments as a kind of summative assessment are set and marked
to summarize students‟ performance at a particular time. In this study, automatic
marking and formative feedback are generated to help students to perform a
self-assessment process that thereby encourage students to master the practical
knowledge, skills and tools needed for programming. A lot of learning happens
when students undertake programming tasks. The programming assessment in this
study is formative rather than summative. Additionally, hand marking of
assignments is time consuming because the content of all submissions should be
inspected thoroughly. Automatic approaches can be used for both static and
dynamic analysis of programs. However, automatic systems are not as flexible as
human assessors and so may misclassify innovative solutions (Ala-Mutka 2004).
A compromise is to use automatic tools to assess some aspects of a program
and manual approaches to assess others (Ala-Mutka 2004). First feedback is
generated from automatic measurement tools. Subsequently, manual code
inspection supports a further code review that may capture more subtle defects in
student programs. We use the mixed automatic and manual approach for analyzing
student programs to develop our novice defect taxonomy.
Another challenge for defect measurement is how to generalize the contents of
defects and match them with existing taxonomies. Labeling errors is affected by
assessors‟ cognitive knowledge and their programming experiences. When
classifying a defect, it can be difficult to match the defects captured by program
measurement with a category of a defect taxonomy. For example, the difficulty of
placing an error into semantic or logic group has been identified as a problem by
13
Hristova et al. (2003). Our novice defect taxonomy provides a hierarchy of
categories to assist in the accurate classification of defects captured by assessing
student programs.
1.3 Approach and Scope
The purpose of our study is to develop a new defect taxonomy for classifying the
defects found in students‟ programs. The methodology we have used contains the
following steps:
1. We derive categories of defects for students‟ programs from existing defect
taxonomies and make a defect list covering the main findings (Chapter 2);
2. We first use automatic analysis tools to capture the defects in a series of
laboratory exercises completed by a student cohort. Code inspection is
then performed to refine the defect list. New defects identified from this
process are added to this defect list. Repeat step 2 until no new defects are
identified (Chapter 3);
3. We develop a hierarchical taxonomy containing all defects on the list. We
specify a definition and a detection approach for each defect (Chapter 4);
4. We evaluate the defect taxonomy by using it to capture defects in nine
exercises completed by four cohorts (1271 submissions in total). Both
defect types and distributions for each exercise have been identified. The
information provides feedback to help students produce high quality code
and can be used to improve teaching strategies (Chapter 5).
14
Figure 1.1. Research Areas Addressed in this Study
This study focuses on the areas, shown in Figure 1.1. Data acquired from
practical sessions is combined with theoretical knowledge to evaluate the quality
of student assignments. Information on defects gained from submission diagnosis
provides researchers with new reference for defect analysis. The landscape of
measurement strategy in this research is associated with objective measures using
assessment tools and techniques, and the subjective measures when instructors
perform code inspections on students‟ Java programs. We evaluate the NDT using
both qualitative and quantitative analysis of a corpus of student programs.
Defect Pattern
What are common
patterns of defects observed during diagnosis?
Evaluation
How can we evaluate NDT?
Knowledge
How to understand software defects in programs and organize them into
NDT?
Defect Type
What factors are measured to assure software quality?
Software
Defect
Measurement
Strategy How can we measure defects in
software?
Assessment Tools
What tools can be used to efficiently measure defects in software?
Humans
How humans measure defects in software?
Qualitative Analysis
What kinds of defects students are encountered?
Improvement
How can we improve teaching strategies to reduce defects in student programs?
Assessment
Techniques
What techniques are used to measure
defects in software?
Quantitative
Analysis
What are the distributions of defects in
programs?
15
Specifically within these research areas, our research targets the following topics:
Knowledge
Defect Type
Defects are grouped by the types of functionality, programming style, language
syntax and code completeness.
Defect Pattern
We discover the defect patterns made by students when they complete a series of
laboratory exercises. The data from defect analysis suggests ways to improve
teaching strategies in an introductory programming course.
Measurement Strategy
Assessment Techniques
It is necessary to recognize techniques implemented for assessing students‟ code
automatically. There are two techniques implemented for analysis: static analysis
and dynamic analysis. Assessment is performed automatically by these two
techniques to capture defects in assignments.
Assessment Tools
Several tools, such as suite of test cases and a code style checker, are provided to
students to allow them to assess their code. These tools will also be employed to
help identify defects that will then be classified using the NDT.
Code Inspection
Instructors discover defects in students‟ assignments by performing a thorough
line-by-line inspection on programs. Manual inspection helps to identify
additional defects in students‟ programs.
Evaluation
Qualitative View
A defect taxonomy is a classification scheme that represent software defect types
in a systematic structure.
Quantitative View
Counting the numbers of defects in a large number of assignments identifies the
16
most frequent defects novices encountered. Lecturers are able to improve their
courses by placing greater emphasis on the areas addressed.
Improvement
Analysis using NDT reveals the challenging areas that students face. This
information provides an evidence for the improvement of programming
assignment setting and grading.
1.4 Research Questions
This dissertation presents a new defect taxonomy for classifying defects
discovered in students‟ programming assignments. Our study aims to organize
defects in a systematic way and provide a tool for evaluating the computer
programs of individuals or a cohort. We divide the overall problems into a total of
eight research questions. Answering these research questions might help to
understand how NDT can be used to detect faults in programs. There are five
overarching research questions (Section 1.4.1, 1.4.3, 1.4.5, 1.4.6 and Section 1.4.7)
I‟d like to address. Three sub-questions (Section 1.4.2, 1.4.4 and 1.4.8) are added
to extend the breadth of thesis scope.
The data used to answer the following questions are gathered from qualitative
and quantitative analysis of nine assignments completed by four cohorts. Patterns
derived from qualitative and quantitative analysis provides insights into the way
students produce defects. The defect patterns found in this study help instructors
improve their setting and grading system of practical exercises. Evaluation using
NDT reveals novices‟ common problems in programming and enables instructors
to improve their teaching by revealing the main challenges students face in
programming.
1.4.1. What types of defects are identified from student submissions?
This research question explores what types of defects occur in student
assignments. We analyse the areas of functionality, style quality, compilation
behavior and code completeness. NDT is used as a tool to classify the defects.
1.4.2. How are these types related to existing defect taxonomies?
17
There are many defect categories in NDT derived from previous studies. They are
actually a superset of prior work. Some defect categories in NDT have been never
identified by previous studies. Another concern is that most previous studies of
defects in students‟ code are limited to only one defect area such as syntax error
analysis by Histova et al. (2003) or logic error analysis by Ahamadzadeh, Elliman
& Higgin (2005). But this study covers four areas of defects in programs.
1.4.3. What are the most common defects made by novices?
Assessing many different program aspects helps to determine what aspects
challenge students the most when they program. This study assesses students‟
assignments and captures defects covering four types: functionality defect; style
defect; language syntactic defect and code completeness defect.
1.4.4. Are these defects consistent with previous work?
Prior published work concentrates on explaining the content of defects and
exposes the distributions for each defect. Qualitative analysis in this study
emphasizes identifying the most common defect types. Counts of defects are used
to evaluate our taxonomy from a quantitative view. Additionally, these counts
show areas where students are having or not having problems.
1.4.5. What do cohort defect patterns tell us about programming exercises?
Patterns derived from empirical work demonstrate the problems students
encountered when they complete a laboratory task. For example, if many students
postponed submitting then the exercise may have been too hard. The analysis of
defect patterns in a cohort contributes to the improvement of practical exercises in
an introductory programming course.
1.4.6. How does the provision of formative feedback with programming
support tools affect the defect rates in submitted assignments?
Students are provided with tools for self-assessment of their code. The tools are
the same ones used to measure defects for our NDT. These tools generate
formative error messages to warn students about defects in their programs and
provide suggestions on how to make corrections on their programs. Because the
teaching laboratory environment changes from year to year, it is possible to use
the NDT to analyze the effect of different lab tools on student programs. We
expect that students produce programs with fewer defects when they are given
18
automatic aids in labs.
1.4.7. Which sorts of problems can be reduced?
Previous studies argue that up to half of the mistakes made can be avoided with
better programming techniques including better programming languages and more
comprehensive test tools (Endres 1975). In this study, quantitative analysis is used
to expose common defects students are prone to make.
1.4.8. What is the related strategy in programming teaching?
In the last question, we list the most common defects identified by using NDT.
For each defect identified, a possible teaching solution is available to suggest the
improvement of teaching strategies. We believe these solutions that instructors
could employ to minimize these types of defects and help to reduce the defects in
submissions.
1.5 Contribution
In this dissertation, we analyse a large number of student programs (1271
submissions) completed by four student cohorts and identify a range of defect
types covered four areas. The main contributions of this dissertation are:
Identification of new defect categories specific to novice programmers covers
the types of code completion defects, code compiled defects, functional
defects and evolvability defects;
Establishing a taxonomy called Novice Defect Taxonomy (NDT) developed
by the new defect categories identified from empirical work;
Reporting the frequency of defects in NDT that helps instructors to pay more
attention to the most common defects and helps novices to avoid making
these defects;
Describing defect measurement mechanism using both automated approaches
and manual methods to detect defects in programs;
19
1.6 Thesis Outline
In this dissertation, Chapter 2 summarizes relevant theoretical studies about
novices‟ difficulties and existing defect taxonomies. Research gaps are discussed
in Section 2.4. Chapter 3 introduces the measurement process we use and outlines
the testing techniques and related tools implemented in this study. Then, Chapter 4
defines the Novice Defect Taxonomy (NDT) and its measurement protocols.
Chapter 5 summarizes the results of experiments using NDT to analyze students‟
code and answers our research questions. Chapter 6 proposes directions of future
work and concludes this study.
20
Chapter 2 Literature Review
Section 2.1 summarizes previous theoretical studies about software defects and
existing defect taxonomies. Prior work by other researchers has focused on of the
characteristics of both novices and experts in programming, qualitative and
quantitative analysis of software defects, and the analysis of defects in different
programming languages. A summary of approaches and techniques for measuring
defects in software is presented in Section 2.2. Then, Section 2.3 compares the
coverage of existing defect taxonomies with this study. Finally, Section 2.4
identifies three open research problems addressed in this dissertation.
2.1 Specifications of Defects
The specific focus of this study is on software defects in students‟ assignments
and previous taxonomies for classifying those defects.
2.1.1 What is a Software Defect?
Defects play an important role in software because they lead a program to act in
an unintended way. Existing studies have investigated many terms error, fault,
failure, problem related to software defects. Their definitions do vary greatly from
paper to paper .We first review previous literature of the definitions of these terms.
Then we give a clear definition of a defect in a student‟s assignment before we
undertake analysis.
21
Definition: “The word anomaly may be used to refer to any abnormality,
irregularity, inconsistency or variance from expectations. It may be used to refer
to a condition or an event, to an appearance or a behavior, to a form or a function”
(IEEE 2010).
Definition: An error is “a human action that produces an incorrect result”
(ISO/IEC 2009).
Definition: “A failure is an event in which a system or system component does
not perform a required function within specified limits” (ISO/IEC 2009).
Definition: A fault is “a manifestation of an error in software” (ISO/IEC 2009).
Definition: “A software problem is a human encounter with software that causes
difficulty, doubt, or uncertainty in the use or examination of the software” (Florac
1992).
Definition: Software defects are defined as follows (Mantyla & Lassenius 2009,
pp. 2):
(1) Failures “should be counted as defects if they cause the system to fail or
produce unsatisfactory results when executed”;
(2) Faults are “incorrect code, which may or may not result in a system failure”;
and
(3) “Deviations from quality” are counted when program changes are made.
Definition: “A software defect is a manifestation of a human (software producer)
mistake” (IEEE 2010).
Many studies investigate defects in programs made by students (see Table 2-1)
but none of them provides a clear definition of software defect. It is necessary to
define a software defect in a student assignment before we perform empirical
work for analyzing defects in submissions.
Definition: A software defect, in a student programming assignment, describes an
imperfection in programming steps prevented the program from performing in
conformance with the specifications. It may refer to serious problems that prevent
the software from being executed or style problems that cause the program to
perform in an intended way.
This dissertation focuses on identifying software defects in students‟
22
assignments rather than assessing their causes. It provides a detailed investigation
of four areas: functionality, code style, language syntax and code completeness.
2.1.2 Defect Taxonomies
A defect taxonomy is used to classify defects in software. It provides a unique
category for each defect detected and provides a systematic way to “measure the
number of defects remaining in the field, the failure rate of the product, the defect
detection rate” (Musa, Iannino & Okumoto 1987).
Definition: A software defect taxonomy is a hierarchical classification scheme
used for categorizing software defects. A hierarchical taxonomy is used to classify
defects in a reproducible way.
Identification of defects in software helps students to realize their challenges in
achieving high quality code (Mantyla & Lassenius 2009). A defect taxonomy
offers program beginners a classification scheme to identify defects from students‟
submissions. Prior studies of the defect taxonomy may analyze the same area so
that their findings may be consistently “share many defect types” (Mantyle &
Lassenius 2009). However, these taxonomies may “miss defect types or use
restrictive definition” (Mantyla & Lassenius 2009). For example, the problems of
unexpected submission or code style have received little attention from prior work.
In this study, we will propose a taxonomy contains four areas: functionality, style
quality, language syntax quality and code completeness which will be discussed in
Chapter 4.
2.1.2.1 Novice VS Experts
Existing defect taxonomies target different groups from novices, advanced
students to more experienced programmers. Robins, Rountree & Rountree (2003)
described the process of a novice becoming an expert breaks in five stages:
“novice, advanced beginner, competence, proficiency and expert”.
Many previous papers summarize novices‟ characteristics in programming.
Typical curricula usually start with teaching a set of concepts of a programming
23
language. First-year students always have many misunderstandings when they
apply these concepts into practice. Novices believed they understand the concepts
but they failed to “apply relevant knowledge” (Robins, Rountree & Rountree
2003).Additionally, novices may understand the programming syntax and
semantics line by line, but they fail to combine these features into a valid program.
Robins, Rountree & Rountree (2003) argued that novices‟ deficits were in “the
surface understanding and various specific programming language constructs”.
Novices spent little time on program comprehension and planning while experts
spent a large proportion of time on that (Robins, Rountree & Rountree 2003).
Ala-Mutka (2004) also argued that beginners lacked background knowledge and
were “limited to surface knowledge „line by line‟ rather than larger program
constructions”. Kessler & Anderson (1989) argued that novices “often apply the
knowledge they learned improperly”. The ability to write good programs is not
directly linked to the ability to understand a program. Although novices can write
functionally correct code, they may still have problems in understanding code
written by others (Ahmadzadeh, Elliman & Higgins 2005).
By comparison, studies about experts in programming focus on their
knowledge structure and their programming strategies. Experts focus on the
representation of sophisticated knowledge and problem solving strategies rather
than surface knowledge. For example, the defect of submitting the wrong file
occurred frequently for novices (Coull et al. 2003). However, that would never
occur for experts. Detienne (1990) modeled the way that experts organized
knowledge and put programming skills into practice. Compared with novices,
experts mastered the programming knowledge and skills better and applied them
in programming more thoroughly.
Another information source of novices‟ debugging activities is about students‟
debugging performance in programming (Ahmadzadeh, Elliman & Higgins 2005).
Program debugging is an important activity in programming. Although various
approaches have been introduced in teaching, students mainly gain their
debugging strategies from practical sessions. Previous research additionally paid
attention to the differences of debugging behaviors between experts and novices.
Gugerty & Olson (1986) conducted two experiments to make a comparison of
experts and novices in their debugging habits. In the debugging process, experts
could fix more bugs and complete their work in a shorter time. They spent more
24
effort on program comprehension and remembered program details much better
than novices. Compared with experts, novices obtained knowledge inconsistently.
They might isolate one error and make the proper correction, but failed to correct
further errors and failed to apply debugging techniques in all situations. Novices
were always struggling with errors in program compilation because they lacked
debugging and programming experience.
This analysis emphasizes defects made by students in programming rather than
professional programmers. Since novices make different errors than experts, a
taxonomy for them needs different categories. Our defect taxonomy is required to
be more stringent on the quality of code completion and code syntax that might be
not addressed by previous studies.
2.1.2.2 Qualitative Analysis VS Quantitative Analysis of Defects
Qualitative analysis identifies defect types in students‟ programs. Coull et al.
(2003) developed a module to extract students‟ compiler error messages. All
common compiler errors are classified into four types: “files not added, incorrect
case, ; expected and } expected” (Coull et al. 2003). The last two types (;expected
and } expected) conform to compiler errors identified by Jackson, Cobb & Carver
(2004). However, the findings of these studies were limited by the data collected
from one semester-size cohort only. The results of error analysis may fluctuate
because the sample data is limited. Many efforts have been devoted to identify the
types of defect made by professionals. The IEEE classification is applicable for
classifying defects in any software or to any phase of the project, product or
system life cycle (IEEE 2010).
Helping students to identify their style defects benefits them in improving their
code quality. Mantyla & Lassenius (2009) showed that style problems account for
approximately 75% of the detected defects in software. Their study showed that
style defects were a large proportion of the total count compared with other defect
types found in programs by both novices and professional engineers.
Comprehensive information about programming defects in the Orthogonal Defect
Classification (ODC) model was reported by Chillarege et al. (1992). Their ODC
model extracted cause-effect relationships between defects and code development,
and associated defects with the development processes. The ODC model, however,
covered defect categories of high occurrence only and did not provide a low-level
25
taxonomy for each defect.
Quantitative Analysis involves counting the number of defects in any collection
of software modules. The defects are first classified then counted in quantitative
analysis. Truong, Roe & Bancroft (2004) listed four logic errors that occurred
most frequently in novice programs: “omitted break statement in a case block”;
“omitted default case in a switch statement”; “confusion between instance and
local variables” and “omitted call to super class constructor”. Jackson, Cobb &
Carver (2004) developed an error collection system to collect all syntax errors and
they explored the most frequent syntactic errors from the collection system. All
defect taxonomies count defect numbers, but our taxonomy is also evaluated from
qualitative analysis perspective so as to refine its defect categories.
2.1.2.3 Object Oriented Programming Languages VS Procedural
Languages
Many universities have changed from teaching novices a procedural programming
language in their first year to teaching an object oriented programming language
such as Java. Prior studies investigated defects in students‟ programs from a
language-based perspective. Lahtinen, Ala-Mutka & Jarvinen (2005) conducted an
international survey of students‟ programming difficulties in learning Java or C++
from students‟ and teachers‟ perceptions. They found that the practical learning
environment benefit programmers most and the experiences gained from the
practical situation enhance learning of concepts. Another interesting finding is that
compared with students, teachers may recognize beginners‟ deficiencies better
than novices themselves believed. The findings of students‟ deficiencies can be an
evidence to develop learning materials to overcome the difficulties they
encountered. Robin, Haden & Garner (2006) derived a problem list by conducting
a survey of relevant literature. The problem list was evaluated by exploring the
distribution of its problem types and this list directed the information from
diagnosing code aspects to improve the design and the delivery of exercises. In
this study, we will develop a defect classification as a resource for Java, an object
oriented language whereas most prior work focused on procedural languages.
Three studies have highlighted students‟ problems in using the procedural
languages. Pea (1986) identified persistent conceptual bugs of parallelism,
intentionality and egocentrism derived from a bug called “superbug”. Their study
26
also identified novices‟ characteristics in writing and understanding the code.
Chabert & Higginbotham (1976) investigated novice errors in assignments using
Assembly Language, and listed types and frequencies of nine errors discovered by
experiments. Kopec, Yarmish & Cheung (2007) identified students‟ errors and
located them in program components when students used the C language.
2.2 Measurement of Defects
2.2.1 Automatic Assessment
Automatic assessment has been introduced in computer science education to
assess students‟ programs. This assessment approach offers instructors an efficient
tool for grading programs and offers students timely feedback to fix defects in
their programs. In this study, the assessment of more than 1200 classes requires a
significant amount of instructors‟ time and effort. The automated assistance
therefore, would reduce the workload of instructors.
Ala-Mutka (2005) summarizes several code features that can be measured
automatically. Once students submit their source code to a central repository
several code aspects can be assessed dynamically or statically. Static evaluations
are performed by collecting information from the source code without executing it.
Dynamic assessments evaluate students‟ programs by executing them (Ala-Mutka
2005). In dynamic assessment, automated tools can actually assess users‟ testing
skills when they are asked to design their own test cases and use these test cases to
test their code (Edwards 2004). Automated tools also assess some specific features
(e.g. language specific implementation issues, memory management) (Ala-Mutka
2005). Dynamic tools execute assignments against a set of test cases to measure
programs‟ functional correctness. Prior work investigates that taking unit testing
approach has a positive effort not only to students in “two humped camels” but
also to the middle learners (Whalley &Philpott 2011). Static tools evaluate code
features (e.g. coding style, design and software metrics) without executing these
programs. Truong, Roe & Bancroft (2004) develop a measurement framework by
using static tools to compare the software metrics and the structural similarity of a
submission with a suggested model solution.
For dynamic assessment, JUnit (JUnit4, n.d.) is used to test the code
27
functionality. Static analysis is performed using Checkstyle (Checkstyle 2001) and
PMD (PMD 2002). These tools will be discussed in the next chapter.
2.2.2 Code Inspection
Automatic assessment is widespread in software engineering education. However,
automatic systems may be inflexible and so unable to award marks for the
innovative solutions (Ala-Mutka 2005). Jackson (2002) argues that automated
assessment can be combined with human components to assess code quality. Code
inspection is a systematic approach to look through the source code line by line.
Code inspections can be combined with automatic analysis as a compromise to
overcome the disadvantages of automatic approaches. This semi-automatic
approach combining code inspection with automatic assessment assesses the
assignments of large student cohorts in this study. First error messages from
automatic assessment are used to extract assignments containing defects. Then,
code inspection is used to identify defects in students‟ assignments. During code
inspections, it is possible to determine which defect category a defect belongs to.
For example, placing a defect into types of “assignment malformed” or
“assignment missing” requires a human component‟s effort because automated
tools cannot distinguish between these detected defects automatically. The
description of how to detect and classify detected defect will be presented in
Section 3.3.6. Code inspection helps to avoid uninformative or inappropriate
feedback that produces inaccurate information of subtle defects in software.
2.3 Categories of Defects
A comparison of high level defect categories of this study and of previous work is
shown in Table 2.1. In this study, novice defects fall into four categories:
completeness, language syntax, functionality and evolvability. The column on the
right, Assurance Perspective is used to specify previous software defect studies
from either the educational perspective or professional perspective.
28
Previous studies&
our NDT
Quality Assurance
Aspects
Assurance
Perspective
Completeness Syntax Functionality Evolvability Educational (E) or
Professional (P)
Ahmadzadeh, Elliman & Higgins 2005
√ √ E
Chillarege et al. 1992 √ √ P
Coull et al. 2003 √ √ E
Hristova et al.2003 √ E
IEEE 2010 √ √ P
Jackson, Cobb & Carver
2004 √ E
Kopec, Yarmish & Cheung 2007
√ √ E
Mantyla & Lassenius 2009
√ √ P
NDT in this research √ √ √ √ E
Table 2. 1. Comparison of Defect Taxonomies (Main Categories)
Several previous studies have been compared with this study by making a
comparison of quality assurance aspects between these studies, In Table 2.1, the
majority of educational studies focus more on the ranges of the code functionality
defects or syntactic problems in programs. The previous investigations of
professional programmers mainly emphasize functionality validation of the
software and the code evolvability. Only one study (Coull et al. 2003) in Table 2.1
focuses on the problem of incomplete code in programs. In Table 2.1, the ranges
of language syntax and code functionality have been widely studied whereas
defects of code style and code completeness are addressed by only a few studies.
However, in this dissertation our taxonomy (NDT) from educational perspective
covers all four areas of novice defects.
2.4 Research Gaps
This study aims to fill in the research gaps: the taxonomy with only high-level
categories; limited data sample for empirical analysis; and ambiguous
classifications for defect analysis.
There are several deficiencies in existing studies. It must be noted that several
studies specify taxonomies to fit defects identified in software made by
professional programmers. However, findings from these studies are difficult to fit
in students‟ programs (Ahmadzadeh, Elliman & Higgins 2005; Hristova et al.
29
2003). These taxonomies also fail to provide lower level categories for the defect
types they identified (Chillarege, Kao & Condit 1991). For example, the ODC
model reported by Chillarege, Kao & Condit (1991) provides only eight high level
groups without any information of their sub-groups. In this dissertation, we will
create, modify and evaluate a defect taxonomy covering high as well as low level
categories to increase scientific knowledge of novice programmers.
Only one data source may be a weakness for defect analysis. Some previous
results have suffered from limited samples collected from only one semester-size
cohort (Mantyla & Lassenius 2009). Results may fluctuate by analyzing data from
limited sample and can be improved by enlarging the size of cohorts (Coull et al.
2003). In this study, data will be collected from a series of laboratory exercises
over several semesters.
It is difficult to accurately match error types addressed from measurement
process with existing defect taxonomies. There might be more than one category
of defect taxonomy matching the defect addressed. To avoid this drawback, it is
ruled that if the defect content match many types, only one type with the higher
level is selected. In NDT, higher level is set to defect type in lower depth and in
the same depth, a defect type with lower taxonomy code has higher level.
30
Chapter 3
Data Collection for the Defect Taxonomy
A Novice Defect Taxonomy (NDT) has been developed by collecting data from a large
number of programming assignments completed by four cohorts. Section 3.1 describes
the cohort selected for measuring defects and summarizes the assignments completed by
these cohorts. Section 3.2 discusses the data collection process. The software attributes
for measuring and the measurement approaches taken are presented in Section 3.3.
Section 3.4 summarizes measurement instruments used in this study. For each
instrument, the following issues are discussed: how the tool works, whether it works as
expected, whether the tool is suitable for laboratory aid and how it contributes to this
experiment. A comparison of static tool detection range is summarized in Section 3.5
followed by a comparison of using static tools into practice in Section 3.6. Lastly,
limitations of the experiments are discussed in Section 3.7.
3.1 Subject and Exercise Choice
This dissertation uses NDT as a tool to evaluate students‟ Java assignments. Firstly,
Section 3.1 outlines some factors of subject and exercise selection addressed.
3.1.1 Subject Choice
Data is collected from students enrolled in the unit Java Programming (JP) and in the
unit Software Engineering (SE) at the University of Western Australia. Both courses
31
teach the Java programming language. Table 3.1 shows the cohort size, the previous
experiences and laboratory support of four cohorts.
Cohort ID
Cohort Size
Typical Programming Experience in
Semesters
Java Tests Provided
Support Tool IDE JUnit Checkstyle PMD
A 94 0 √ -- -- -- BlueJ
B 184 0 √ -- -- -- BlueJ
C 75 >=1 √ √ √ -- Eclipse
D 200 0 √ √ -- √ BlueJ
Table 3. 1. Student Experience and Laboratory Support for Different Cohorts
For typical programming experience in semesters, students in Cohort A, B and D
enrolled in the unit Java Programming have no previous experience of Java and need to
master basic concepts in Java programming. The students who had completed Java
Programming course then take the unit Software Engineering.
3.1.2 Exercise Choice
Students submit their solutions to the cssubmit system, an on-line system managing the
assignment submission, marking and feedback (McDonald 2009). Submissions are
selected from this repository for our empirical work. Only small-scale programming
exercises are analyzed in this study. Prepared tests written by instructors help novices to
write well-formed programs. For each cohort, assignments of two or three exercises are
generally selected as the data of our empirical work. Typical exercises are available
on-line from http://undergraduate.csse.uwa.edu.au/units/CITS1200
and http://undergraduate.csse.uwa.edu.au/units/CITS1220.
For each assignment, an ID is given to identify the assignment uniquely. This
assignment ID comprises a cohort ID and a numeric suffix code. The numeric code
stands for the sequence of assignment completed by this cohort. For example, A1 stands
for the first assignment completed by Cohort A. Table 3.2 shows the scale of these
assignments in terms of a sample solution size written by the course instructor.
32
Assignment
ID
Student
Cohort
Sample Solution Properties
Attributes Methods Non Commented
Lines Of Code Public Private
A1 A 7 0 10 48
B1 B
7 0 10 51
B2 2 0 8 31
B3 13 4 8 98
C1 C
4 0 9 46
C2 0 0 5 27
C3 1 0 9 29
D1 D 6 0 11 49
D2 4 0 9 66
Table 3. 2. Sizes of Java Assignments
In their assignments students practice using language constructs such as conditional
statements and for loops. Labs A1, B1, C1 and D1 require students to complete methods
involving if statements but no loops. In C2, students are required to generate six
methods for a simple calculator. In some advanced labs, the array and arraylist are
required. B2 and D2 tests students‟ abilities to manage strings and array structures. In
B3, students use two-dimension arrays. C3 evaluates students‟ performance of using the
Java library class arraylist.
Complexity
Level
Assignment
ID
Language Constructs Used
Exp. if…else for array arraylist exception
Level 1 (Low)
A1 √ √ -- -- -- -- B1 √ √ -- -- -- -- D1 √ √ -- -- -- -- C1 √ √ -- -- -- --
Level2 (Intermediate)
B2 √ √ √ √ -- -- D2 √ √ √ √ -- -- C2 √ √ √ √ -- --
Level 3 (High)
B3 √ √ √ √ -- √ C3 √ √ √ -- √ --
Table 3. 3. Java Language Constructs used in Assignments
Table 3.3 summarizes the language constructs used in assignments. The nine assessed
labs are classified into three levels of complexity. The complexity levels of assignments
are determined on the basis of constructs involved as well as the length of assignment
methods. In Table 3.3, Level 1 labs (low complexity level) use only expression and
conditional control statement. Previous studies argue that using structures such as arrays
and loops really challenge students in an introductory programming course (Robins,
Rountree & Rountree, 2003). Labs using structures such as array and loop are catered
into Level 2 labs. Category 3 labs requires students to use arraylist(C3) and exception
33
handling (B3).
Complexity
Level
Assignment
ID
Number of Subject
Number of Submission
Number of Compiled
Submissions
Level 1 (Low)
A1 94 93 85
B1 184 181 172
D1 200 196 193
C1 75 70 61 Level 2
(Intermediate) B2 184 143 132
D2 200 128 109
C2 75 71 59 Level 3 (High)
B3 184 155 133
C3 75 55 44
Table 3. 4. Cohort Size and Submissions for Each Assignment
Table 3.4 shows the number of subjects, the number of submitted assignments and the
number of submissions that compiled successfully of each lab assessed in this study.
The majority of subjects are able to upload their submissions on time and able to
complete labs of low complexity level. More than 87% submissions could be compiled
in Level 1. A high proportion (23% in B2 and 36% in D2) of students fails to submit
their written programs in Level 2 labs. Compared with Level 2, a larger proportion of
cohort B submits their programs in Level 3. At Level 3, 29 and 20 students accounting
for 15.8% and 26.7% respectively fail to submit programs. The rate of missing
submission conforms to the observation of Robins (2010) that in a typical programming
course the rate of assignment submission falls during the semester.
3.2 Data Collection Mechanism
Programming assignments develop students‟ abilities to write well-formed programs.
Completion of a laboratory assignment is a complicated process as many activities are
involved. Figure 3.1 outlines the process a student may go through to complete an
assignment.
34
Figure 3.1. Recommended Process for Completing a Programming Assignment
In Figure 3.1, students start by reading and understanding assignment requirements
published on the unit webpage. Subsequently, they make preliminary plans, complete
coding and compile their programs. If there are no syntax errors, testing programs are
available from the webpage to capture functional faults. Unit testing tools (JUnit test
suites) and style checking tools (Checkstyle and PMD) are given to Cohort C and D.
However, Cohort A and B do not have test cases and style tools but only debugging
tools (BlueJ). For Cohort C and D, students use support tools to run their programs
against the testing programs. Afterwards, error messages generated by support tools
show students detailed information on the faults in their programs. Students are
encouraged to resubmit assignments if they wish. Subsequently, students are provided
with feedback generated from instructors‟ testing of their submissions. The feedback
generated provides students with some suggestions on how to correct their defects in
programs.
The process shown in Figure 3.1 is an ideal process that students are encouraged to
undertake but they may skip when they complete a task. Actually, students of the cohort
analyzed by this study are encouraged to plan their work before they do coding and to
do self-testing before they submit their programs. Students are advised to perform the
pre-testing by themselves because both the quality of code style and the correctness of
code functionality will be scored by instructors subsequently. I think the phase “making
program plan” may be overlooked but “Code Testing with JUnit/CS or PMD” may be
Understand
Requirements
Make Program
Plan
Download Java
Convention Rule
and Test Case Files
Code
Generation
Program
Compilation
Code Testing with
JUnit
Code Testing with
CS or PMD
Completed
Assignment
Programming
Task Start
Assignment
Submit
Compilation
Passed?
All Tests Passed?
No, compilation feedback generated by compiler.
Yes
Yes
No, feedback generated by automatic tools.
35
implemented by most students because they want to get higher scores.
3.3 Defect Measurement
The dissertation aims to reveal what areas prevented students from achieving high
quality code. In this section, we first introduce defect attributes in Section 3.3.1. The
counting approaches taken for measuring attributes are discussed in Section 3.3.2.
Section 3.3.3 summarizes the defect measurement framework. Detection for each
attribute is described from Section 3.3.4 to Section 3.3.7.
3.3.1 Software Attributes for Measuring
Assessing different program aspects helps to determine what aspects challenge students
the most when they program. Before we perform a defect measurement on students‟
assignments, it is necessary to define a software attribute, quality assessment and
instrument used for defect measurement.
Definition: A software attribute is a property of a program that can be evaluated.
Several software attributes are considered in our study. Each attribute is measured and
feedback is generated as follows:
Validate language syntax and provide a feedback on program compilation errors;
Validate code completeness and generate a report on code compilation;
Validate functional correctness and provide a suggestion for repairing functional
defects;
Validate coding standards and generate a report on style violations.
36
Figure 3. 2. A Summary of Measurement Validation Concepts
Definition: Quality assessment quantifies the extent of software attributes conformed to
software requirements.
Definition: Measurement instruments are devices used to measure software attributes.
This study introduces a defect measurement framework. Both static and dynamic
analysis are used to analyze students‟ programs.
Definition: Static Analysis is an “evaluation that can be carried out by collecting
information from program code without executing it.” “Static analysis may reveal
functionality issues that have been left unnoticed by the limited test cases” (Ala-Mutka
2005).
Definition: Dynamic Analysis “is carried out against several test data sets, and each of
them is evaluated individually, starting from the initial state and completing all the
processing before the assessment of the output or the return value” (Ala-Mutka 2005).
3.3.2 Defect Counting Approaches
Data collected from the defect measurement framework is used to create a new defect
taxonomy. Basili & Perricone (1984) quantified errors in a software project from both
textual and conceptual view depending on the different purposes of the error analysis. In
Attributes
Quality Assessment
Measurement Instruments
Syntax
Functionality
Style
Compilation
Dynamic Analysis
& Code Inspection
Static Analysis
& Code Inspection
Java compiler
JUnit
Checkstyle
PMD Code Inspection
Completeness
37
the context of textual count, defects resulted from the same problem are counted
repeatedly as many times as their occurrences. The second measure counts the
conceptual effect of a defect across the source code. The conceptual approach counts the
problem only once although it occurs many times. Many textual counts may yield only
one defect from the conceptual signature measure. In this analysis, both conceptual
signature count and textual signature count are used to measure students‟ defects in
submissions. Both textual signature count and conceptual signature count are defined as
follow as:
Definition: Textual Signature Count is the sum of defects detected in all Java class
source code completed by a subject cohort. The cohort can be one cohort from cohort A,
B, C and D or it can be several cohorts from the cohort set.
Definition: Conceptual Signature Count is the number of subjects who made at least
one error in Java class source code completed by a subject cohort. The cohort can be
one cohort from cohort A, B, C and D or it can be several cohorts from the cohort set.
3.3.3 Defect Detection Framework
A defect measurement framework (see Figure 3.3) is proposed to analyze students‟
programs. A shell script is used to evaluate a submission automatically. The report
generated covers defects detected related to code completeness, compilation,
functionality and evolvability. The detection framework adopts both dynamic and static
assessment to evaluate students‟ programs.
Definition: Compiler Error Detection addresses incorrect compile behaviors when
compiler translates the source code into computer language (Hristova et al. 2003).
Definition: Evolvability Fault Detection addresses defects “that affect future
development efforts instead of runtime behavior” (Mantyla & Lassenius 2009).
Definition: Functional Fault Detection addresses defects by providing input and
examining output to validate the correctness of internal program structure.
38
Figure 3. 3. An Overview of a Defect Measurement Process
Figure 3.3 summarizes an overview of our defect detection process. Students submit
their solutions to the cssubmit system, an on-line system managing the assignment
submission, marking and feedback (McDonald 2009). The cssubmit system creates a
new directory for each submission. Instructors subsequently download submissions and
run a shell script on them to measure the programs. If an assignment cannot be
compiled, an error message is generated automatically. Otherwise, static analysis
(Checkstyle and PMD) and dynamic analysis (JUnit) are performed on this assignment.
Subsequently, submissions are filtered based on the testing results and code inspections
are conducted to capture more subtle defects in selected assignments. The detailed
descriptions of static analysis tool (Checkstyle & PMD) and dynamic analysis tool
(JUnit) will be presented in Section 3.4.
Feedback
All Tests Passed?
Code Review
Dynamic Analysis (JUnit) and Static
Analysis (Checkstyle, PMD)
Compilation
CSSE cssubmit Web
Server
Marking Shell Script
Student Solution
Java Compiler (javac)
YES
NO
NO
YES Defect Detection Framework
39
3.3.4 Compilation Detection
The Java compiler is distributed as a part of the Sun JDK package. The compiler reads
the source code and compiles it into a byte code file. The Java compiler stores the byte
code in a class file named classname.class. The detection checks for the correctness of
the compiler. In this study, a missing submission will be classified as NO SUBMISSION
and a file in incorrect format will be classified as UNRECOGNIZED FILE. The Java
compiler also detects syntactic errors (e.g. methods with wrong signatures) when the
submitted Java class fails to match the expected signature.
3.3.5 Evolvability Fault Detection
Evolvability detection measures whether novice code meets formal coding standards. A
high quality of style attributes improves the readability and maintainability of programs.
Static Tool Rule Set Rule
Checkstyle
Naming
Convention
ConstantName, LocalVariableName,
MemberName, MethodName, ParameterName,
StaticVariableName, TypeName
Coding
AvoidInlineConditionals, InnerAssignment,
MagicNumber, MissingSwitchDefault,
EmptyBlock, EmptyStatement
Comments JavadocMethod, JavadocType, JavadocVariable
Complexity BooleanExpressionComplexity,
CyclomaticComplexity, NPathComplexity
Size
Violation
MethodLength, ParameterNumber,
FileLength
PMD Dead Code UnusedPrivateField, UnusedLocalVariable,
UnusedPrivateMethod, UnusedFormalParameter
Table 3. 5. Metrics for Evolvability Fault Detection
PMD (PMD 2002) and Checkstyle (Checkstyle 2001) are two customizable tools.
Both of them are offered as Eclipse and BlueJ plug-ins to help developers meet coding
standards. Checkstyle provides rules to detect style faults. In this study, 26 Checkstyle
rules are selected and customized into five groups. These rules cover the detections of
naming conventions, Javadoc comments, common coding problems, size violations and
40
over-complex code. PMD is additionally used to detect “dead code”: code that is never
used in the other parts of a program. The rules selected from Checkstyle and PMD are
listed in Table 3.5. The rules picked up are a part of rule set for each tool. These rules
are selected to assess the correctness of code properties that are most relevant to the
code written by students. For example, the rule AbstractClassName from the Naming
Convention rule set hasn‟t been selected because students haven‟t developed any
abstract classes in their programs yet while the rule MethodName has been selected to
assess whether in student submissions the method names meet the naming standards.
For illustration, a feedback from detection on the code complexity
(BooleanExpressionComplexity, CyclomaticComplexity, NPathComplexity) is shown in
the following:
TextAnalyser. Java:32:36: warning: Expression can be simplified.
TextAnalyser. Java:121:5: warning: Cyclomatic Complexity is 12 (max allowed is 11).
TextAnalyser. Java:180:5: warning: Cyclomatic Complexity is 54 (max allowed is 11).
TextAnalyser. Java:180:5: warning: NPath Complexity is 67,108,865 (max allowed is 100).
TextAnalyser. Java:293:5: warning: Cyclomatic Complexity is 54 (max allowed is 11).
TextAnalyser. Java:293:5: warning: NPath Complexity is 67,108,865 (max allowed is 100).
TextAnalyser. Java:407:5: warning: Cyclomatic Complexity is 27 (max allowed is 11).
3.3.6 Functional Correctness Detection
According to Kaner (2003), the quality of software “is multi-dimensional” and “the
nature of quality depends on the nature of the product.” One important quality criterion
is functional correctness that has been used in many previous quality validations
(Ahmadzadeh, Elliman & Higgins 2005). Functional defects lead to a program‟s
unexpected behavior at execution time. Code functionality can be measured by
executing a set of test cases on a program (Ala-Mutka 2005). These test cases help users
to understand how the program works, to assure the quality of program and to expose
defects within the program (Allwood 1990). Sub-classes of FUNCTIONAL DEFECT in
NDT are derived from the dynamic analysis of student assignments.
How to Write Test Cases
Our goal is to perform dynamic testing on assignments to trigger failures that expose
41
defects. Test cases dynamically assess small testable portions of a program. Testable
portions are usually individual methods in the program. JUnit is a leading unit testing
tool used to execute functional tests on students‟ assignments. It provides users with a
platform to generate test cases and run them repeatedly. Results from dynamic testing
can verify the software‟s correctness under a given testing strategy but they cannot
verify that the software might not fail under other testing conditions (Kaner 2003).
Furthermore, there is “no simple formula for generating „good‟ test cases” to expose
more bugs (Kaner 2003). For illustration, Figure 3.4 shows instructions of one method
frequencyOf() selected from the class TextAnalyser. The lab sheet is available from
http://undergraduate.csse.uwa.edu.au/units/CITS1200 (Java Programming (CITS1200)
n.d.).
This lab contains some work that will be assessed. In order to get the mark, you must
submit the file TextAnalyser.java. Do NOT submit any other files (for example
TextAnalyser.class, lab06.zip or anything else). The specifications of the TextAnalyser are
as follows:
Instance Variables
In this exercise you decide on the instance variables you will need, rather than being given
them. Re-read lecture 13 and its summary sheet. Then try doing a text analysis by hand.
The things you need to write down and remember to do this task are the instance variables
you will need in the TextAnlyser class.
Constructor
The TextAnalyser has a single constructor that sets up the anlayser to start receiving and
analyzing text strings. corpusID is a string that captures the types of text you will be
analyzing. For example, “ShakespearsWorks” or “abrahamLincolnSpeeches” or
“SMSmessages”. Note that I am specifying the spelling of the works “TextAnlyses” and
you are not permitted to substitute “analyze” or “Analyzer”.
public TextAnalyser (String corpusID)
Now uncomment the first test in TextAnalyserTest and run it to check your constructor.
Methods Objects of the class TextAnalyser must have the following public methods; you
should use private helper methods as needed to make your code clearer, shorter and more
readable. Uncomment the results for each method as you work.
public int frequencyOf (char c)
returns the raw frequency of the character c in the text analysed since the last clearance. It
should return the same value if the user specifies the character in either upper or lower
case.
Figure 3. 4. The TextAnalyser Assignment
Test suites for student assignments generally start with a set of basic cases of normal
inputs for each method. Additionally, a good test suite hits every boundary case,
including maximum and minimum values of inputs. Extreme representatives are used to
maximize the coverage of tests. Examples of JUnit test cases for testing this method are
shown in Figure 3.5. The class TextAnalyser calculates and reports the frequencies of
42
characters in a given text. This assignment aims to test students‟ abilities in performing
calculations on elements stored in an array. Method frequencyOf() analyzes the input
text and returns the frequency for each letter in the given string. Test cases are explained
by breaking down the testing program.
The class TextAnalyserTest is a subclass of the test case by TextAnalyserTest extends
TestCase;
The setUp() method sets up a test fixture by creating objects for interaction and
storing them as instance variables in the testing class. Similarly, the method
tearDown() cleans up the test fixture after executing test cases;
Within each test case, an assert statement is used to validate the result of code
execution. Many types of assert statements are available such as the assertEquals,
assertTrue and assertFalse. Assert statements trigger failures if the expected results
from code execution is not equal to actual values;
According to the test case design principle above, test case testFrequencyNormal()
validates a program with a normal input. The case testFrequencyInitial() validates
the initial value array stored since the last clearance. For extreme cases,
testFrequencyNonAlpha() specifies letters‟ frequencies with no alpha input and
testFrequencyEquals() returns all characters occurred most frequently when more
than one letter are counted the highest occurrences.
43
import junit.framework.*; / * @version March 2010
*/ public class TextAnalyserTest extends TestCase { TextAnalyser ta; //sample strings for testing analyser functions String sample = "Freshmen enrolled in software engineering take an introductory programming course. Helping novices to learn to program is a difficult task as they are lacking programming knowledge and skills for problem solving. The study at hand aims to develop defect taxonomy for hierarchically organizing novice defects to provide a perfect view of what errors novice programmers
are making. “; int[] vals = {102,14,31,58,165,27,28,80,68,0,3,42,13,77,93,16,1,79,44,126,22,24,28,0,10,0}; String nonalpha = " *** 42 !!"; String equalAB = "ABaabbAB"; public void setUp() throws Exception{ ta=new TextAnalyser("TestCorpus"); } public void tearDown() throws Exception{
ta=null; } @Test public void testFrequencyInitial() { for (char ch = 'a'; ch <= 'z'; ch++) { assertEquals(0,ta.frequencyOf(ch)); } @Test public void testFrequencyNormal() {
ta.addTexttoCorpus(sample); for (char ch = 'a'; ch <= 'z'; ch++) { assertEquals(vals[ch-'a'], ta.frequencyOf(ch)); } for (char ch = 'A'; ch <= 'Z'; ch++) { assertEquals(vals[ch-'A'], ta.frequencyOf(ch)); } }
@Test public void testFrequencyNonAlpha(){ ta.addTexttoCorpus(nonalpha); for (char ch = 'a'; ch <= 'z'; ch++) { assertEquals(0,ta.frequencyOf(ch)); } for (char ch = 'A'; ch <= 'Z'; ch++) { assertEquals(0,ta.frequencyOf(ch));
} } @Test public void testFrequencyEquals(){ ta.addTexttoCorpus(equalAB); assertTrue(ta.frequencyOf(„a‟)==ta.frequencyOf(„b‟)); }
}
Figure 3. 5. Test Cases for frequencyOf()
How Test Cases are Used to Expose Defects
Using dynamic test cases is an effective way to evaluate software quality because “it is
possible to tell whether the program passed or failed” (Kaner 2003). Dynamic testing
targets individual methods in a class. However, it may be difficult to test the methods
individually because the interaction of variables and methods is involved in many cases.
According to Kaner (2003), “if the program has many bugs, a complex test might fail so
quickly that you don‟t get to run much of it”. For example, if a class constructor
44
contains a defect then many test cases will fail too.
The results of executing test cases on code expose the defects in student submissions.
The code shown in Figure 3.6 contains 3 defects (annotated as d1, d2 and d3). Three
defects are highlighted in the following figure. Each defect in this segment is selected
from the source code written by students. We synthetically combine these segments
together for illustration purposes. In the following sections this code will be used to
illustrate how test cases are used to expose defects.
public class TextAnalyser { // instance variables - replace the example below with your own
private String CorpusID; private String corpus;
private int numChars; // Constructor for objects of class TextAnalyser public TextAnalyser(String corpusID) { CorpusID = corpusID; corpus = ""; numChars = 0;
} public void addTexttoCorpus(String text){
corpus = corpus + text; for (int i=0;i<text.length();i++){ corpus = corpus + text.charAt(i); if (Character.isLetter(text.charAt(i))){ numChars = numChars + 1; } }
public int frequencyOf(char c){
int letterFreq = 0; for (int i=0;i<corpus.length();i++){ if ((corpus.charAt(i)== c)||(Character.toLowerCase(corpus.charAt(i))==c)){ letterFreq= letterFreq+1; } } return letterFreq; }
public double percentageOf(char c){ int freq = frequencyOf(c);
double percentage; percentage = freq*100/(corpus.length());
return percentage; }
public char mostFrequent() { return 'a';
}
Figure 3. 6. An Example of Buggy Fragment of class TextAnalyser
The assessment of student programs using NDT has been reported many times to
ensure NDT contains as many defect types as possible from empirical findings. During
the repeated assessment process, test cases have been updated accordingly. New test
cases are added to the testing programs to capture defects identified from empirical
work. This assessment will be repeated until no new defect types are found during this
process. For methods frequencyOf(), percentageOf() and mostFrequent(), eleven test
cases are designed. For illustration, we first show the global variables of the testing
d1
d2
d3
45
class TextAnalyserTest then present the eleven test cases used to test the these three
methods.
import org.junit.Test;
import junit.framework.TestCase;
public class TextAnalyserTest extends TestCase {
TextAnalyser ta;
//sample strings for testing analyser functions
String sample = "This dissertation presents a new defect taxonomy for classifying defects discovered in students‟
programming assignments. Our study aims to organize defects in a systematic way and so provides a tool for evaluating the computer programs of individuals or a cohort. We divide the overall problems into five research questions. Answering these research questions shows how NDT can be used to detect faults in programs and to evaluate students‟ performance.";
double[]vals ={29.0,2.0,12.0,11.0,38.0,9.0,7.0,7.0,22.0,9.0,8.0,11.0,12.0,12.0,6.0,12.0,8.0,13.0,22.0,11.0,01.0,21.0,6.0,9.0,12.0,10.0};
double[] percentageArray={2.6,0.0,7.8,0.0,7.8,2.6,7.8,2.6,7.8,0.0,0.0,2.6,5.2,5.2,15.7,5.2,0.0,7.8,5.2,2.6,0.0,5.2,0.0,0.0,5.2,0.0};
String nonalpha= "36+,:…..?";
String equalsAB= "aaabbbaabb";
1. testFrequencyNormal() Test if method frequencyOf() returns the raw frequencies of
characters in a given text constructed by letters, punctuation, spaces and anything else.
The frequencies of both upper and lower case characters in the input string should be
counted;
@ Test
public void testFrequencyNormal(){
ta.addTexttoCorpus(sample);
for (char ch=‟a‟; ch<=‟z‟; ch++){
assertEquals(vals[ch-„a‟], ta.frequencyOf(ch));
}
for (char ch=‟A‟; ch<=‟Z‟; ch++){
assertEquals(vals[ch-„A‟], ta.frequencyOf(ch));
}
}
2. testFrequencyInitial() Test if the initial values of array stored equal to 0;
@ Test
public void testFrequencyInitial(){
for (char ch=‟a‟; ch<=‟z‟; ch++){
assertEquals(0, ta.frequencyOf(ch));
}
}
3. testFrequencyNonAlpha() Test if the values of characters‟ frequencies equal to 0
when analyzing an input text composed by spaces, punctuations and anything else
that is not a letter;
@ Test
46
public void testFrequencyNonAlpha(){
ta.addTexttoCorpus(nonalpha);
for (char ch=‟a‟; ch<=‟z‟; ch++){
assertEquals(0, ta.frequencyOf(ch));
}
for (char ch=‟A‟; ch<=‟Z‟; ch++){
assertEquals(0, ta.frequencyOf(ch));
}
}
4. testFrequencyEquals() Test if method frequencyOf() returns all letters occurring
most frequently;
@ Test
public void testFrequencyEquals(){
ta.addTexttoCorpus(equalAB);
assertTrue(ta.frequencyOf(„c‟)== ta.frequencyOf(„b‟));
}
5. testPercentageNormal()Test if method percentageOf() returns a percentage of the
analyzed text equals the character in a normal text composed by letters, spaces,
punctuations and anything else;
@ Test
public void testPercentageNormal(){
ta.addTexttoCorpus(shortSample);
for (char ch=‟a‟; ch<=‟z‟; ch++){
assertEquals(percentageArray[ch-„a‟], ta.percentageOf(ch), 0.1);
}
}
6. testPercentageInitial() Test if the initial percentages of characters analyzed equal to
0;
@ Test
public void testPercentageInitial(){
for (char ch=‟a‟; ch<=‟z‟; ch++){
assertEquals(0, ta.percentageOf(ch));
}
for (char ch=‟A‟; ch<=‟Z‟; ch++){
assertEquals(0, ta.percentageOf(ch));
}
}
7. testPercentageNonalpha() Test if the percentage of analyzed text equals to 0 when
analyzing a text composed of spaces, punctuations and anything else that is not a
letter;
@ Test
47
public void testPercentageNonalpha(){
ta.addTexttoCorpus(nonalpha);
for (char ch=‟a‟; ch<=‟z‟; ch++){
assertEquals(0.0, ta.percentageOf(ch));
}
for (char ch=‟A‟; ch<=‟Z‟; ch++){
assertEquals(0.0, ta.percentageOf(ch));
}
}
8. testMostFrequentNormal() Test if method mostFrequent() returns a lowercase of the
characters that occurred most frequently in a given text composed by letters, spaces,
punctuations and anything else;
@ Test
public void testMostFrequentNormal(){
ta.addTexttoCorpus(shortSample);
assertEquals(„o‟, ta.mostFrequent());
}
9. testMostFrequentInitial() Test if the initial character is set to „?‟ as specified for this
assignment;
@ Test
public void testMostFrequentInitial(){
assertEquals(„?‟, ta.mostFrequent());
}
10. testMostFrequentNonAlpha() Test if the method correctly returns the characters
occurring most frequently in an input text composed of spaces, punctuations and
anything else that is not a letter;
@ Test
public void testMostFrequentNonalpha(){
ta.addTexttoCorpus(nonalpha);
assertEquals(„?‟, ta.mostFrequent());
}
11. testMostFrequentEquals() Test if method mostFrequent() returns all letters
occurringed most frequently;
@ Test
public void testMostFrequentEquals(){
ta.addTexttoCorpus(equalsSample);
assertEquals(„c‟=, ta.mostFrequent() || „b‟=ta.mostFrequent()));
}
To illustrate how test cases are used to expose defects, the testing results from
different defect sets are presented in Table 3.6. Results from the test set execution
48
expect to expose the precise defects in D.
Defect Set Test Case Result
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11
d1 Ⅹ √ √ √ Ⅹ Ⅹ Ⅹ √ √ √ √
d2 Ⅹ √ √ √ Ⅹ Ⅹ √ Ⅹ Ⅹ Ⅹ Ⅹ
d3 Ⅹ √ √ √ Ⅹ Ⅹ Ⅹ Ⅹ Ⅹ Ⅹ Ⅹ
d1, d2 Ⅹ √ √ √ Ⅹ Ⅹ √ √ √ √ √
d1, d3 Ⅹ √ √ √ Ⅹ Ⅹ Ⅹ Ⅹ Ⅹ Ⅹ Ⅹ
d2, d3 Ⅹ √ √ √ Ⅹ Ⅹ √ Ⅹ Ⅹ Ⅹ Ⅹ
d1, d2, d3 Ⅹ √ √ √ Ⅹ Ⅹ √ Ⅹ Ⅹ Ⅹ Ⅹ
Table 3. 6. Test Case Failures and Relevant Defects in Program
From the table, it can be seen that the surface bugs identified by the test cases provide
insufficient information to identify the underlying misconceptions for these defects.
That is, none of the test failing sets can identify the defect. The presence of one defect
(d1) in D leads to several test failures (t1, t2, …., tx) and one set of test failure (t1, t2,..,
tx) may have resulted from different sets of defect occurrence. For example, when the
test failure set (t1, t5, t6, t8, t9, t10, t11) is detected, it is likely d2 occurred. But there is
no direct evidence to determine if d1 and d3 also exist in the same code segment. This
drawback can be overcome by performing a further inspection on the code by hand.
3.3.7 Code Inspection
Code inspection complements dynamic and static assessment. In a code inspection, test
failure reports generated by prior analysis help inspectors concentrate on explicit parts
that may contain faults. Inspectors then inspect the parts line by line to reveal the
precise defects in programs.
The code inspection process used in this study conforms to conventional inspection
steps, including an inspection preparation and a data collection stage (Mantyla &
Lassenius 2009). During the preparation, an instructor‟s solution is offered to inspectors.
Meanwhile, a checklist and an inspection form are available for inspectors to locate
faults in a program. In this study, the inspection checklist focuses on faults in a software
product rather than faults in the programming process. Prior to filling in the checklist
form, an inspection form is used to direct the inspectors to specify defect contents.
Three inspection forms are available checked for code properties covering functional
property, syntactic property and style property. These forms guide inspectors to fill in
the checklist count for functional, syntactic and style defects found in a program. In this
49
section, we take the Functional Property Inspection Form and Functional Defect Count
Checklist for illustration to indicate how to address functional defects through the
inspection process. Table 3.7 shows the Functional Property Inspection Form used in
code inspection. Ten questions listed in left column of Table 3.7 help inspectors address
functional faults in an assignment. Each question is mapped to at least one defect type
in NDT. Next to the inspection question are two columns used to indicate defect types
included or excluded by an assignment. The columns, Conceptual Count and Textual
Count, are used to specify conceptual and textual counts for code functional properties.
Functional Property Inspection Form
Student ID:
Lab Exercise:
Inspection Date:
ID Inspection Question Include Exclude Conceptual
Count
Textual
Count
1 Variables are declared improperly. 2 Not all related variables are
declared.
3 Variables haven‟t been initialized
before they are used.
4 Variables are initialized
incorrectly.
5 Array index are out-of-bound. 6 The input ranges of variables
haven‟t been checked before they
are updated.
7 Expression and functions have
been checked incorrectly before
they are used.
8 Some improper conditions are in
loop.
9 Variables within loops aren‟t
update in time.
10 There are some functionality
missing or unresolved issues.
Table 3. 7. Functional Property Inspection Form
During the code inspection process, the question list in Table 3.7 was reviewed
several times to determine whether the form keeps appropriate records on all defect
information. The Functional Property Inspection Form guides completion of the
Functional Defect Count Checklist. This checklist has 9 items covering 4 areas to keep
detailed records on functional defects. In this checklist, four columns next to the Defect
Type column help to record defect counts of functional faults. Defect types shown as
bold section headings are sub-classes of defect types shown as bold and underlined
section headings. For example, FUNCTIONAL DEFECT is a main defect category. The
50
sub-classes within are PLAN, INITIALIZAITION, CHECK and COMPUTATION.
Sub-sub classes FUNCTION MISSING and LOGIC are grouped under PLAN. A defect
count is filled in based on the record of the Functional Properties Inspection Form.
Table 3.8 below shows the full details of Functional Defect Count Checklist. Different
inspectors are allowed to classify the same defect into different categories in this study.
When classifying a defect, inspectors make selections based on their programming
knowledge and previous experience. For example, an inspector may place a defect into
the type of “ASSIGNMENT MALFORMED” or the type of “LOGIC”. Some inspectors
may be sure of one category fit the defect detected, but some may feel a bit confused
because they believe more than one category fits this. Thus, although we try to
minimize this for the accuracy of the NDT, different counting may still occur.
Functional Defect Count Checklist
Student ID:
Lab Exercise:
Inspection Date: Defect Type Included Excluded Conceptual
Count
Textual
Count
FUNCTIONAL DEFECT
PLAN
FUNCTION MISSING LOGIC INITIALIZATION
ASSIGNMENT MISSING ASSIGNMENT MALFORMED
CHECK
VARIABLE EXPRESSION FUNCTION
COMPUTATION
MISSING UPDATE MALFOEMD UPDATE
Table 3. 8. Functional Defect Count Checklist
In the inspection process, an inspector concentrates on the parts based on previous
automatic feedback. For example, since there is no direct evidence to determine if d1
and d3 exist in the program tested in Table 3.6, the inspector will pay more attention on
the correctness of class constructor and method completeness to identify whether there
are any d1s or d3s. Inspectors note all the issues found during the inspection process
and fill in the checklist form. Finally, the inspector‟s records are integrated with the
automatic feedback to reveal what precise faults exist in programs.
51
3.4 Measurement Instruments
There are many tools available for static and dynamic analysis of Java code. In this
section we discuss the measurement tools selected for measuring defects in student code.
JUnit is used to dynamically assess student submissions and Checkstyle and PMD
statically assess the code style. Other tools such as Findbugs (Findbugs, n.d.) can
analyse Java programs but these tools are for professional use and have too much
learning load for students. Thus, these tools are excluded by this research. For each tool
used, we use the following criteria for evaluation:
Why select this tool or why not?
Does it work as expected?
Is it useful for the development of defect taxonomy?
How does the tool work?
3.4.1 Integrated Development Environments
BlueJ (Kolling 1999) is an interactive environment designed for teaching the Java
programming language. BlueJ supports a simple interface for students to interact with
objects in the development stage. For the students who have completed an introductory
programming course, Eclipse (Eclipse n.d.) is configured and offered in laboratories.
Plug-ins for Eclipse and BlueJ are available to support additional validation of code
style, functionality, code efficiency and program metrics in the code.
3.4.2 JUnit
XUnit are frameworks based on a design by Kent Beck, which allows testing of
different elements of functions and classes. JUnit (JUnit4, n.d.) is a testing framework
evolved from XUnit created by Erich Gamma for the Java Programming Language. It is
an open source testing framework, used to prepare and run repeatable tests. JUnit is a
leading testing tool used to launch functional tests both for teaching and in industry.
Additionally, it can be used as a plug-in for integrated development environments to run
the test cases automatically, and provides a report on the successes or failures of
executing test cases.
52
3.4.3 Checkstyle
Checkstyle (Checkstyle 2001) is a static analysis tool. It scans the abstract syntax tree of
source code and checks whether the code adheres to coding standards defined by rules.
Checkstyle can be plugged in to many IDEs including Eclipse and BlueJ. It automates
the static validation process on Java code. This tool is highly configurable and supports
the checks for various coding standards as well as rules generated by unit instructors.
Most check rules are configurable. In this study, the configurable check rules are
taken from Checkstyle 5.4 release (Checkstyle 2001). A wide range of rule sets check
for 15 standard rule sets. These rule sets are grouped to specify rules to discover 15
style aspects, namely: Annotations; Block Checks; Class Design; Coding; Duplicate
Code; Headers; Imports; Javadoc Comments; Metric; Miscellaneous; Modifiers;
Naming; Conventions; Regular Expressions (Regexp); Size Violations and Whitespace
(Checkstyle 2001). Among all checks available from SunCodeConventions, the style
validations of coding, duplicate code, Javadoc comments, naming conventions,
complexity and size violation are most relevant to this research. These rule sets are
selected and executed on student assignments to ensure students‟ code conforms to the
conventional standards of basic format, comments, method and file length, code
complexity and magic number.
3.4.4 PMD
PMD (2002) is a tool that statically analyzes source code. It is integrated with various
Java IDEs. It includes 25 built-in rule sets and supports the ability to write customizable
rules. PMD 5.0 version (2002) contains these rule sets to discover style errors: Index;
Android; Basic; Braces; Code Size; Clone; Controversial; Coupling; Design; Finalizers;
Import Statement; J2EE; Javabeans; JUnit Tests; Logging; Java; Logging; Jakarta;
Migrating; Naming; Optimizations; Strict Exceptions; Strings; Sun Security; Unused
Code; Java Server Pages and Java Server Faces (PMD 2002). Typically, PMD errors
are not functional bugs, but are associated with style problems. For example, code may
still pass functional testing when they fail the PMD detections. The problems which
PMD is looking for include potential bugs, unused code, selective code, complex
expressions and duplication. The potential bugs contain empty blocks such as try, catch
statements and the unused code fields. Complex expressions refer to unnecessary
53
statements in if, for or while loops. Code duplication detects copy-and-paste code in the
program.
PMD allows users to configure PMD rule sets flexibly. Defects in submissions are
identified by the basic and unused code rule sets, which mainly discover empty code
blocks, initialize or unnecessary conversions and unused private fields, methods and
local variables.
3.5 Comparison of Static Analysis Tools
Figure 3.7. Detect Coverage of Static Analysis Tools
Figure 3.7 shows the detection range of two static analysis tools and the overlap
between these tools. Checkstyle and PMD are configured to detect violations in coding
convention and code layout. The overlap between two analysis tools shares 7 areas
including basic style detection, naming, duplication, complexity, code size, unused code
and import. Defects in novice code mainly cover the overlap of two analysis tools the
some Checkstyle rules (e.g comments rule). The shadow in Figure 3.7 depicts the static
detection coverage of this study.
PMD
JUnit, J2EE, Sun Security
Checkstyle Annotation,,Comments, Indentation
Basic, Naming, Duplication, Complexity, Code Size, Unused Code Import
Novice
54
3.6 Static Analysis Tools in Practice
In this section, these three static tools are used to assess one code segment. Results
explicitly illustrate the different problems revealed by different tools. The exercise
CustomerList requires students to complete a class by using Java collection class
ArrayList. The domain objects, constructor and methods are presented in the following
figure (see Figure 3.8).
01 public class CustomerList { 02 private ArrayList<String> customerslist; 03 public CustomersList() {
04 customerslist = new ArrayList<String>(); 05 } 06 public String getCustomerByIndex(int pos) { 07 return customerslist.get(pos); 08 } 09 public String longestNameCustomer() { 10 if (customerslist.size() == 0) { 11 return "";
12 } 13 int x = customerslist.get(0).length(); 14 int y = 0; 15 int n = 0; 16 for (int i = 1; i < customerslist.size(); i++) { 17 y = customerslist.get(i).length(); 18 if (y > x) { 19 n = i;
20 x = y; 21 } 22 } 23 return customerslist.get(n); 24 } 25 public String listAllCustomers() { 26 String x = ""; 27 String y; 28 for (int i = 0; i < customerslist.size(); i++) {
29 y = customerslist.get(i) + "\n"; 30 x = x + y; 31 } 32 return x; 33 } 34 }
Figure 3. 8. A Solution of Class CustomersList
Each static tool captures different violations within the code segment. The violations
detected by Checkstyle are:
Line 04: Missing a Javadoc comment (Detected by Rule Set: Comments, see Table
3.5);
Line 06: Parameter pos should be final (Detected by Rule Set: Coding, see Table 3.5);
Line 25: Method „listAllCustomers‟ is not designed for extension, needs to be
abstract, final or empty (Detected by Rule Set: Coding, see Table 3.5).
When PMD is run against the code segment, the violation reports are shown in the
following:
55
Line 02: It is somewhat confusing to have a field name matching the declaring class
name (Detected by Rule Set: Naming Convention, see Table 3.5);
Line 06: Parameter „pos‟ is not assigned and could be declared final (Detected by
Rule Set: Coding, see Table 3.5):
Line 11: A method should have only one exit point, and that should be the last
statement in the method (Detected by Rule Set: Coding, see Table 3.5);
Line 30: Prefer StringBuffer over += for concatenating strings (line 30) (Detected by
Rule Set: Coding, see Table 3.5).
The second violation „parameter is not assigned and could be declared final‟ is also
detected by Checkstyle. In addition, the first violation addressed by PMD indicates that
PMD focuses on coding convention and spends more effort on coding style
improvement. The third issue warns users limit to the return statements in a method.
This finding conforms to the report generated by PMD. The findings suggest that
Checkstyle focuses on pure coding standard issues such as convention, spacing and
indentation. PMD detects non-conformance in coding convention. It additionally detects
violations prevented the programs from effectively being executed.
3.7 Measurement Risks
Measurement risks in this study are from three possibilities: bias of data source
selection; researchers‟ bias caused by personal assumptions and the same data set used
for training and testing.
Data from many sources may be not comparable because the data source is
heterogeneous. Therefore, selection bias may influence the empirical outcome. The
selection bias may refer to the data collected from different levels of programmers (e.g.
beginners, advanced programmers or professional programmers). The data may also be
from the source of the final software product or the products from different stages in
programming. To overcome this weakness, the data from two sources can improve the
empirical data in this project that one source is collected from fundamental
programming course (CITS1200 Java Programming) and the other is from advance
course (CITS1220 Software Engineering) while some studies only have one data
source.
Prior experience may affect the threshold selection for code evolvability and
56
functionality detection and extend the effect to the fault classifying from the observation
session. It is possible that the defect count varies greatly when select the different
thresholds. We cannot overcome this point and can only try to minimize it. In this study,
it should be allowed different threshold choice by different researchers judged based on
their programming knowledge and experience. When classifying a defect, inspectors
make selections based on their programming knowledge and previous experience.
Generally in machine learning, two different sets are available for developing
classification system and for verifying the system respectively. The data set for training
is different from the set for system testing. Students in Cohort A were enrolled by the
unit of Java Programming and weren‟t given to automatic tools for self-assessment.
From the assessment of cohort A, this cohort‟s submissions had many defects. We used
the defect list to develop an initial NDT. Then, other cohorts (Cohort B, C & D) were
considered to refine the defect list and update the categories of NDT. Subsequently, we
performed the measurement process repeatedly to ensure NDT contains as many as
defect categories as possible to enlarge the data sample as well as the possibility of
addressing new defect categories from empirical work.
57
Chapter 4
Novice Defect Taxonomy Specification
This chapter specifies Novice Defect Taxonomy. Defect specifications contained
defect definitions, typical examples and detection approaches are described in
Section 4.2.
4.1 Novice Defect Taxonomy
Defect categories of Novice Defect Taxonomy (NDT) are identified from previous
studies, and from empirical static and dynamic analysis. The complete Novice
Defect Taxonomy is shown in Figure 4.1. To demonstrate the abstraction level of a
defect, a taxonomy code is assigned to each defect class in NDT. The taxonomy
code consists of an alpha prefix D and a numeric code. The numeric code
indicates the position of the defect classes within the NDT tree. For example,
sub-class INCOMPLETE CODE (D1.1) belongs to defect category CANNOT
COMPILE (D1). This sub-class is the first defect class in CANNOT COMPILE.
58
Figure 4. 1. Novice Defect Taxonomy
D1.1
INCOMPLETE
CODE
D1
CANNOT
COMPILE
D1.2
SYNTAX
ERROR
D1.1.1 NO SUBMISSION
D1.1.2 UNRECOGNIZED FILE
D1.1.3 INCOMPLETE METHOD
D1.2.1 TYPE MISMATCH
D1.2.2 MISMATCHED BRACE{} OR
PARENTHESIS()
D1.2.3 OTHER SYNTAX ERRORS
D2
COMPILED
D2.1.1
PLAN
D2.1
FUNCTIONAL
DEFECT
D2.1.2
INITIALIZATION
D2.1.3
CHECK
D2.1.4
COMPUTATION
D2.1.1.1 FUNCTION
MISSING
D2.1.1.2 LOGIC
D2.1.2.1 ASSIGNMENT
MISSING
D2.1.2.2 ASSIGNMENT
MALFORMED
D2.1.3.1 VARIABLE
D2.1.3.2 EXPRESSION
D2.1.3.3 FUNCTION
D2.1.4.1 MISSING
UPDATE
D2.1.4.2 MALFORMED
UPDATE
D2.2
EVOVLABILITY
DEFECT
D2.2.1 DOCUMENTATION
D2.2.2
STRUCTURE
D2.2.1.1 NAMING
D2.2.1.2 COMMENTS
D2.2.1.3 CODING
D2.2.2.1 SIZE
VIOLATION
D2.2.2.2 COMPIEX
CODE
D2.2.2.3 UNUSED
CODE
59
Figure 4. 2. Levels 1 and 2 of the Novice Defect Taxonomy
The NDT proposed by this study is a four-level taxonomy that presents a
hierarchical model for defects in assignments. One challenge of using NDT to
classify the defects is that the defect content identified may match more than one
defect class. In this study, we set up a counting rule to solve this that will be
discussed in Section 5.1.
4.2 Defect Specification
In this section, we specify each defect categories of NDT. The defect
specifications contain the following information:
a description of a defect; example code segment(s) containing the specific
defect; and information about how the defect is detected including the methods
and tools used to detect the defect.
Each code segment is taken from the solutions written by students. The
requirement of each lab are available on-line from Java Programming Course
(CITS1200) http://undergraduate.csse.uwa.edu.au/units/CITS1200 and the
Software Engineering Course (CITS1220)
http://undergraduate.csse.uwa.edu.au/units/CITS1220.
D1. CANNOT COMPILE
In NDT, the highest-level NOVICE DEFECT is divided into two classes:
NOVICE
DEFECT
D1 CANNOT COMPILE
D 1.1
INCOMPLETE CODE
D1.2
SYNTAX ERROR
D2 COMPILED
D 2.1
FUNCTIONAL DEFECT
D 2.2
EVOLVABILITY DEFECT
60
CANNOT COMPILE defect and COMPILED defect, depending on whether a
defect leads to compile failures or not. The CANNOT COMPILED class contains
defects that prevent a program from being successfully compiled. The
COMPILERD defects are usually associated with students lacking syntax
knowledge or resulting from their unintentional behaviors. Defect class
COMPILED is further classified into two sub-classes: INCOMPLETE CODE and
SYNTAX ERROR. The taxonomy of CANNOT COMPILE is presented in Figure
4.3. Both INCOMPLETE CODE and SYNTAX ERROR have their sub-classes.
Figure 4. 3. CANNOT COMPILE Class
D1.1 INCOMPLETE CODE
Class INCOMPLETE CODE is associated with expected file missing or
unrecognized file format found in reserved directories. Class INCOMPLETE
CODE is classified into three sub-classes, NO SUBMISSION, UNRECOGNIZED
FILE and INCOMPLETE METHOD.
D1.1.1 NO SUBMISSION
Description
No submitted file found in reserved directory of CSSE cssubmit system
(McDonald, 2009) is detected and classified into the class NO SUBMISSION.
Detection
Java compiler fails to find any submissions in the reserved directory. It generates
an error message “file is not found”.
D1.1.2 UNRECOGNIZED FILE
Description
D 1
CANNOT
COMPILE
D 1.1 INCOMPLETE
CODE
D 1.2
SYNTAX ERROR
D 1.1.1 NO SUBMISSION
D 1.1.2 UNRECODNIZED FILE
D 1.1.3 INCOMPLETE METHOD
D 1.2.1 TYPE MISMATCHED
D1.2.2 MISMATCHED BRACE {} OR
PARENTHESIS ()
D 1.2.3 OTHER SYNTAX ERRORS
61
Class UNRECOGNIZED FILE includes defects such as submitted file but with
unrecognizable format or incorrect names. For example, file
BankAccountXXXX.java or BankAccount.zip is classified into UNRECOGNIZED
FILE.
Detection
Java compiler fails to find the file with the expected name or format. The compiler
generates an error message “file is not found”.
D1.1.3 INCOMPLETE METHOD
Description
Incorrect method signature including methods missing, methods with a mistyped
name or incorrect parameter types are detected and then classified into the class
INCOMPLETE METHOD.
Sample Code
Method insert() fails the compiler because it should return a double array here but
there is no expected statement return there.
01 public class ArrayUtilities { 02 public double[] insert(double k, double[] a, int p) { 03 // TODO add code for this method here
04 } 05 }
Detection
Java compiler cannot compile the submission. An error message “cannot find
symbol” is generated when the code signature are incorrect.
D1.2 SYNTAX ERROR
Language syntax defines the correct use of symbols and tokens to construct
programs. In this study, a shell script is written to automatically assess the code
syntax. In the script once a program cannot pass Java compiler, an error message
“compilation failed” is generated automatically. This may mask remaining syntax
errors that need to be confirmed by performing an inspection.
D1.2.1 TYPE MISMATCHED
Description
Class TYPE MISMATCHED is associated with defects such as an expected return
value‟s data type is incompatible with the actual return type or the expected return
62
value is missing.
Sample Code
A value with the data type double should be returned by the method sum() but the
return statement is missing. The sample code contains several defects that may
affect the understanding of this code segment. Firstly, the parameter should be the
type double [] instead of a double. Second, the loop counter should be an int
rather than a double. Third, the condition to continue in for loop is incorrect. The
condition should be less than a.length. Then, sum can be a local variable but it
hasn‟t been declared. Lastly, the method sum() requires to return a double.
01 public class ArrayUtility { 02 public double sum(double a) { 03 for (double i=0;i>a; i++){ 04 sum += a; 05 } 06 } 07 }
Detection
The Java compiler may fail when it executes the statement “for (double i=0;i>a;
i++){”. An error message afterwards is generated to warn that a compiler error has
been detected. The actual cause of syntax errors may need to be confirmed by a
code inspection. The findings from inspection are used to classify the defect
addressed into a greater depth of NDT or a bottom defect category of NDT.
D1.2.2 MISMATCHED BRACE{} OR PARENTHESIS ()
Description
Class MISMATCHED BRACE{} OR PARENTHESIS () is associated with detects
such as unbalanced placement of parentheses, braces or brackets in programs.
Sample Code
A right parenthesis is missing in the conditional expression of if statement.
01 public class BankAccount{ 02 public void applyInterest(){ 03 if(balance<0 { 04 balance=(int) 05 ( balance*rate+balance); 06 } …………………………. 07 }
08 }
Detection
Java compiler cannot compile the submission and generate a message
“compilation failed”. The actual cause of a syntax error may need to be confirmed
63
by performing a code inspection. The findings from inspection are used to classify
the defect addressed into a greater depth or a bottom defect category of NDT.
D1.2.3 OTHER SYNTAX ERRORS
Description
Class OTHER SYNTAX ERRORS includes syntax error types excluded by classes
D 1.2.1 TYPE MISMATCH and D1.2.2 MISMATCHED BRACE{} OR
PARENTHESIS().
Sample Code
Class ArrayUtility fails the compiler because a variable sum hasn‟t been defined.
This defect classified into category OTHER SYNTAX ERRORS leads to a
compilation failure.
01 public class ArrayUtility { 02 public double sum(double a[]) { 03 for (int i=0;i<a.length(); i++){ 04 sum += a[i]; 05 } return sum;
06 } 07 }
Detection
Java compiler cannot compile the submission and generate an error message
“compilation failed”. The deep reason of the defect is confirmed by performing a
code inspection.
D2. COMPILED
Class COMPILED is further divided into two sub-classes: EVOLVABILITY
DEFECT and FUNCTIONAL DEFECT. The analysis of COMPILED class and its
sub-classes uses automated approaches (both test case and code style checking)
and manual techniques (code inspection).
D2.1 FUNCTIONAL DEFECT
FUNCTIONAL DEFECT class is divided into the following four sub-classes: PLAN,
INITIALIZATION, CHECK and COMPUTATION. These sub-classes of FUNCTIONAL
DEFECT are identified from the previous studies (Basili & Selby 1987; Chillarege et
al. 1992) and discovered from code inspection. Figure 4.4 presents the
sub-classes and sub-sub-classes of the defect class FUNCTIONAL DEFECT.
64
Figure 4. 4. FUNCTIONAL DEFECT Taxonomy
D2.1.1 PLAN
Class PLAN refers to failures resulted from functionality missing or coding
strategy implemented incorrectly. This defect class is similar to the defect type
larger defects proposed by Mantyla & Lassenius (2009). Both defect PLAN in this
study as well as larger defects require a large modification or an additional code
to be added to a program. It is subdivided into two subgroups: FUNCTION
MISSING and LOGIC.
D2.1.1.1 FUNCTION MISSING
Description
Unlike the sub-class INCOMPLETE METHOD in CANNOT COMPILE class,
class FUNCTION MISSING covers code where the method signature is correct
but the method body is missing. Such code will pass Java compiler but fail JUnit
test cases. We classify this defect as FUNCTION MISSING.. However, class
INCOMPLETE CODE would fail Java compiler.
Sample Code
The sample code segment shows within topLetters() a direct String returned but
no expected body of the method given in to take the expected actions.
01 public class TextAnalyser{ 02 public String topLetters(int wordlength){
03 return “etao”; 04 } 05 }
Detection
D 2.1 FUNCTIONAL
DEFECT
D 2.1.1
PLAN
D 2.1.2
INITIALIZATION
D 2.1.3
CHECK
D 2.1.4
COMPUTATION
D 2.1.1.1 FUNCTION MISSING
D 2.1.1.2 LOGIC
D 2.1.2.1 ASSIGNMENT MISSING
D 2.1.2.2 ASSIGNMENT MALFORMED
D 2.1.2.1 VARIABLE
D 2.1.3.2 EXPRESSION
D 2.1.3.3 FUNCTION
D 2.1.4.1 MISSING UPDATE
D 2.1.4.2 MALFORMED UPDATE
65
First the code is successfully compiled by Java compiler. Then the code fails JUnit
tests. The actual cause of the functional defect may need to be confirmed by
conducting a code inspection on the submission.
D2.1.1.2 LOGIC
Description
Class LOGIC is associated with defects made in code blocks using mathematical
operators. A defect is detected when unintended outputs are observed or the
program terminates abnormally.
Sample Code
The method find() is expected to return index of an element in the parameter array.
The method returns the data accumulated by values stored in array instead of the
element index.
01 public int find(double k, double[] a){ 02 for (int i=0; i<a.length-1;i++){ 03 k+= a[i];
04 } 05 return (int)(k); 06 }
The correct code show in the following:
01 public int find(double k, double[] a){ 02 for (int i = 0; i < a.length; i++) { 03 if (a[i] == k) {
04 return i; 05 } 06 } 07 08 }
Detection
The code is successfully compiled but fails some JUnit tests. The actual cause of
functional defects should be confirmed by performing a code inspection.
D2.1.2 INITIALIZATION
Class INITIALIZATION defects occur during variable initialization in constructors
and methods. Class INITIALIZATION is subdivided into two sub-classes:
ASSIGNMENT MISSING and ASSIGNMENT MALFORMED. Both of the
sub-classes are from the study of Kopec, Yarmish & Cheung (2007).
D2.1.2.1 ASSIGNMENT MISSING
Description
Class ASSIGNMENT MISSING is associated with incorrect variable initialization
or incorrect data structures used in constructors or methods.
66
Sample Code
The sample code segment shows a constructor that is missing initializations of
instance variables maxBalance and minBalance
01 public BankAccount(String accountName, int balance) { 02 this.balance=balance; 03 this.accountName=accountName; 04 }
Detection
After the code has been successfully compiled by Java compiler, we detect the
initialization to non-default values with JUnit but need a code inspection if the
expected value is the default but is missing.
D2.1.2.2 ASSIGNMENT MALFORMED
Description
Class ASSIGNMENT MALFORMED is associated with defects such as the
incorrect assignment statements in constructors.
Sample Code
The line 04 in the following segment shows an incorrect variable assignment. The
initial value of minBalance is expected to be equal to the value of balance.
01 public class BankAccount{ 02 public BankAccount(String accountName, int balance) { 03 this.balance = balance; 04 minBalance =0; 05 ……………… 06 } 07 }
Detection
First the code is successfully compiled. Then we detect the initialization to
non-default values. The actual cause of an initialization defect is confirmed by
performing a code inspection.
D2.1.3 CHECK
Class CHECK addresses the defects made when a required validation checking is
incorrect or missing. This class in this study conforms to the findings by Mantyla
& Lassenius (2009). They provide only high-level category CHECK but we
subdivide the class CHECK into the three following sub-classes: VARIABLE,
EXPRESSION and FUNCTION.
67
D2.1.3.1 VARIABLE
Description
Class VARIABLE is associated with defects in checking that an input variable is
within a valid range.
Sample Code
The range of the interest declared in line 02 should be checked before the balance
is updated.
01 public void applyInterest (double rate) { 02 int interest; 03 interest = (int) (balance*rate); 04 if (balance < 0)
05 {balance = balance - interest;} 06 }
Detection
After the code is compiled, it fails some boundary condition tests and extreme
tests that test the boundary and extremities of a valid range. The extreme case tests
the execution of statements with an invalid input while boundary case tests an
input with a boundary value. For example, boundary case tests whether the
method withdrawn () works when the bank balance equals 0. Extreme case tests
whether this method works when the balance equals -100. A code inspection is
needed to confirm the detected defect belonged to VARIABLE, EXPRESSION or
FUNCTION.
D2.1.3.2 EXPRESSION
Description
Class EXPRESSION is associated with defects such as unguarded expressions.
Sample Code
Both the range of variables rate and balance should be checked..
01 public void applyInterest (double rate) { 02 int interest;
03 interest = (int) (balance*rate); 04 if (isOverdrawn()||rate<0) { 05 balance = balance - interest; 06 } 07 }
The correct code is shown in the following:
01 public void applyInterest (double rate) { 02 int interest; 03 interest = (int) (balance*rate); 04 if (isOverdrawn()&&rate<0) { 05 balance = balance - interest; 06 } 07 }
68
Detection
After the code is compiled, it fails some boundary and extreme tests that test the
boundary and the extremities of a valid range. A code inspection is needed to
confirm the detected defect belonged to VARIABLE, EXPRESSION or
FUNCTION.
D2.1.3.3 FUNCTION
Description
Class FUNCTION includes defects such as the incomplete subroutine call in code
or subroutine call missing.
Sample Code
The function isOverdrawn() is expected to be called in the applyInterest() method.
01 public void applyInterest (double rate) 02 {int interest; 03 interest = (int) (balance*rate); 04 if (rate<0) 05 {balance = balance - interest;}
06 }
The following segment shows a correct solution:
01 public void applyInterest (double rate) 02 {int interest; 03 interest = (int) (balance*rate);
04 if (rate<0) { 05 if (isOverdrawn()){ 06 balance = balance - interest; 07 } 08 } 09 }
Detection
After the code is compiled, it fails some boundary and extreme tests. A code
inspection is needed to confirm the detected defect belonged to VARIABLE,
EXPRESSION or FUNCTION.
D2.1.4 COMPUTATION
Class COMPUTATION is associated with defects in mathematical operations that
may involve, in addition to arithmetic operators, in one or more of the following:
constants, functions (method), and equality and relational operators. The
COMPUTATION class is divided into two sub-classes: MISSING UPDATE and
MALFORMED UPDATE. Both of the sub-classes are from the study of Kopec,
Yarmish & Cheung (2007).
69
D2.1.4.1 MISSING UPDATE
Description
Class MISSING UPDATE includes defects that a required updating of the value of
a variable is not present.
Sample Code
In the help method deposit(), the maxBalance is expected to be updated. The
update for maxBalance is missing in the following segment.
01 public void deposit(int amount){ 02 if (amount>0){ 03 balance+=amount; 04 valueDeposits+=amount;
05 } 06 }
Detection
First the code is successfully compiled by Java compiler. Then the code fails JUnit
tests relevant to specific method. The results of test cases show inspectors that the
value of maxBalance is incorrect. This code is therefore suspected of containing
defects of MISSING UPDATE or MALFORMED UPDATE. A code inspection is
needed to confirm the most appropriate defect class.
D2.1.4.2 MALFORMED UPDATE
Description
Class MALFORMED UPDATE is associated with defects that a wrong value is
updated by a variable when an assignment to a variable is presented.
Sample Code
In the following segment, in line 07 the variable maxBalance should be updated
instead of balance.
01 public class BankAccount{ 02 public void deposit (int amount) { 03 if (amount>0){ 04 balance = balance + amount;
05 sumDeposits+=amount; 06 if(balance>maxBalance) { 07 balance=maxBalance; 08 } 09 } 10 }
Detection
First the code is successfully compiled by Java compiler. Then the code fails JUnit
tests relevant to specific method. The results of test cases show inspectors the
value of variable incorrect. The code is therefore suspected of containing defects
of MISSING UPDATE or MALFORMED UPDATE that requires a code inspection
70
to confirm.
D2.2 EVOLVABILITY DEFECT
Evolvability defects affect program development and maintenance efforts instead
of runtime behaviors. The term „evolvability defect‟ is taken from the taxonomy
developed by Siy & Votta (2001). The identification of an evolvability defect is
undertaken by static analysis. The evolvability defect class and its sub-classes as
proposed by Mantyla & Lassenius (2009) are used in this study. They will be
discussed in detail in Section 5.1
There are two sub-classes DOCUMENTATION and STRUCTURE as
sub-classes of the defect class EVOLVABILITY DEFECT. Class EVOLVABILITY
DEFECT and its sub-classes are presented in Figure 4.5.
Figure 4. 5. EVOLVABILITY DEFECT Taxonomy
D2.2.1 DOCUMENTATION
Documentation is the information provided in the source code to help developers
understand the program. The DOCUMENTATION class has three sub-classes:
NAMING, COMMENTS and CODING.
D2.2.1.1 NAMING
Description
Class NAMING detects Java identifies that do not conform to the naming
conventions of Java Language Specifications and Sun Coding Conventions. The
defect identification is undertaken using Checkstyle (2001) and the rules
implemented as follows:
ConstantName checks for names of constants compounded by upper case letters and
D 2.2
EVOLVABILITY
DEFECT
D 2.2.1
DOCUMENTATION
D 2.2.2
STRUCTURE
D 2.2.1.1 NAMING
D 2.2.1.2 COMMENTS
D 2.2.1.3 CODING
D 2.2.2.1 SIZE VIOLATION
D 2.2.2.2 COMPLEX CODE
D 2.2.2.3 UNUSED CODE
71
digits;
LocalVariableName checks for local non-final variable identifiers beginning with letters,
followed by letters or digits;
MemberName checks for non-static names. The name begins with letters, followed by
letters or digits;
MethodName checks for method names beginning with letters, followed by letters or
digits;
ParameterName checks for names of parameters beginning with letters, followed by
letters and digits;
StaticVariableName checks for static non-final variable names beginning with letters,
followed by letters and digits;
TypeName ensures that class name or interface name begin with letters, constructed by
letters and digits”.
Sample Code
The declaration of variable AccountName in line 03 violates the convention. The
variable name should be in camel-case.
01 public class BankAccount { 02 public int balance; 03 public String AccountName; 04 ……………………………………… 05 }
Detection
The code is successfully compiled and run by Java compiler. It violates Naming
rule of Checkstyle (see Table 3.6).
D2.2.1.2 COMMENTS
Description
Class COMMENTS is mainly associated with defects such as the explanation of
classes, constructors, methods, interfaces and variables missing in source code.
The measurement modules for class COMMENTS are taken from Checkstyle
(Checkstyle 2001) and listed as follows:
JavadocMethod checks comments for constructor and method in code;
JavadocType checks “comments for interface and class definitions” or ignoring the
“author or version tags”;
JavadocVariable checks” that variables have Javadoc comments”.
Sample Code
A Javadoc comment is missing for chargeInterest().
72
01 public class BankAccount { 02 /**
03 *Account balance. 04 */ 05 private int balance; 06 ………………………. 07 public void chargeInterest(double rate){ 08 } 09 }
Detection
The code is successfully compiled by Java compiler. It fails the rules of Comments
rule set (see Table 3.6).
D2.2.1.3 CODING
Description
Class CODING is associated with defects such as (Checkstyle 2001):
Empty statement or block;
Inline conditionals;
“Inner assignments in sub-expressions, such as in String s= Integer.toString(i=2)”;
MissingSwitchDefault associates for the existence of the default clause in switch loops.
Sample Code
There is an empty else branch in an if-else block.
01 public char mostFrequent(){ 02 ……………………………………. 03 if(countarray[i]>=countarray[maxindex]){ 04 maxindex=i; 05 }
06 else{ 07 } 08 }………………………………… 09 }
Detection
The code is successfully compiled by Java compiler. It fails rules of Coding rule
set (see Table 3.6).
D2.2.2 STRUCTURE
Class STRUCTURE is associated with too long or too complicated structures of
the source code. SIZE VIOLATION, COMPLEX CODE and UNUSED CODE are
three sub-classes in STRUCTURE class presented in Figure 4.5.
D2.2.2.1 SIZE VIOLATION
Description
Class SIZE VIOLATION includes the defect such as a method with too many
parameters, too many lines or duplicate code. FileLength checks for long source
73
files. This detect module sets the default value of file length to the length of an
instructor‟s solution. The module MethodLength validates long methods and
constructors that may lead to code that is hard to understand. As a solution, long
methods and classes should be broken down into sub-methods. Similar to the
module FileLength, the threshold of MethodLength of different exercises sets for
the maximum method length value of different exercises. The validation modules
in class SIZE VIOLATION are also taken from Checkstyle (Checkstyle 2001) and
listed as follows:
FileLength “checks for long source files”, The file length is set to the length of solution
written by an instructor;
MethodLength “checks for long methods and constructors”. The method length is set to
the maximum method length of solution written by an instructor;
ParameterNumber “checks the number of parameter of a method or constructor”. The
default value is 3 in this study;
DuplicateCode performs “a line-by-line comparison of all code lines and reports
duplicate code if a sequence of lines differs only in indentation”. The default value sets
for 12.
Sample Code
The length of the following function frequency() is 55.
01 public class TextAnalyser{
02 int frequency[]= new int[26];
03 String alphabet ="abcdefghijklmnopqrstuvwxyz";
04 int charcount=0;
05 ……………………………………..
06 public int frequency(char c){
07 if (c== alphabet.charAt(0)){
08 charfrequency = frequency[0];}
09 if (c== alphabet.charAt(1)){
10 charfrequency = frequency[1];}
11 if (c== alphabet.charAt(2)){
12 charfrequency = frequency[2];} 13 if (c== alphabet.charAt(3)){
14 charfrequency = frequency[3];}
15 ……………………………………………………..
16 return charfrequency;
17 } 18 }
Detection
The code is successfully compiled by Java compiler. It fails rules of rule set Size
Violation (see Table 3.6).
D2.2.2.2 COMPLEX CODE
Description
74
Class COMPLEX CODE is associated with the code that can be replaced by using
helper methods. The following list gives the validation modules selected from
Checkstyle (Checkstyle 2001):
BooleanExpressionComplexity checks the maximum number of operators such as &&
and || in an expression which may lead the code difficult to understand, debug and
maintain. The default condition number of maximal allowance sets for 3 in this study;
CyclomaticComplexity measures ”the number of if, while, do, for, ?:, catch, switch, case
statements and operators && and || in the body of a constructor, method, static initializer,
or instance initializer. It is a measure of the minimum number of possible paths through
the source and therefore the number of required tests”. Generally below 8 is fine and 11+
needs to be rewritten”. The default complexity is set to 11 in this study;
NPathComplexity detects the maximum number of execution paths in function. The
default maximum limits to 200 in Checkstyle.
Sample Code
The cyclomatic complexity of TextAnalyser is 55.
01 public class TextAnalyser{ 02 public String topLetters (int wordlength){ 03 String toplet = ""; 04 char[] letters = new char[wordlength]; 05 int[] numlet = new int[wordlength]; 06 letters[0] = mostFrequent(); 07 if (letters[0] == 'a') { 08 numlet[0] = a;
09 ………………………………………………… 10 } 11 else if (letters[0] == 'z') { 12 numlet[0] = z; 13 else { numlet[0] = 0; } 14 for (int i=1;i<wordlength;i++) { 15 letters[i]=mostFrequentuptoMax(numlet[i-1]; 16 if (letters[i] == 'a') {
17 numlet[i] = a; 18 ……………………………………………. 19 } 20 else if (letters[i] == 'z') { 21 numlet[i] = z; 22 } 23 else { numlet[i] = 0; } 24 }
25 for (int i=0;i<wordlength;i++) { 26 toplet = toplet + letters[i]; } 27 return toplet; } 28 }
Detection
The code is successfully compiled by Java compiler. It fails Complex Code rule
set of Checkstyle (see Table 3.6).
75
D2.2.2.3 UNUSED CODE
Description
Class UNUSED CODE is associated with that the statement is never executed
because there is no path to it from the rest of a program. The PMD validation
modules detect unused local variables, unused private methods and fields:
UnusedPrivateField “detects when a private field is declared and assigned a value, but
not used”;
UnusedLocalVariable “detects when a local variable is declared and assigned, but not
used”;
UnusedPrivateMethod “detects when a private method is declared but is unused”;
UnusedFormalParameter detects “passing parameters to methods or constructors” and
those parameters never used.
Sample Code
The statement if() never be executed.
01 public class TextAnalyser { 02 ……………………. 03 private String (String s){ 04 final xxx=false;
05 if(xxx) { 06 07 } 08 } ……………………………… 09 }
Detection
The code is successfully compiled by Java compiler. It fails Unused rule set of
PMD (see Table 3.6).
4.3 Summary
This chapter specifies NDT on both high level categories (e.g. Code Completeness,
Syntax Error, Functional Defect and Evolvability Defect) and on low-level
categories (e.g. Function Missing and Logic). For each category, defect
specification contains a description of the defect, an example segment and
approaches to detect the defect.
76
Chapter 5
Analysis Using the Novice Defect
Taxonomy
We believe that NDT can be used as a tool to hierarchically classify defects in a
reproducible way. In this study, both qualitative analysis (defect type) and
quantitative analysis (defect count) are performed on student submissions. To
present the defect count we use both textual signature count and conceptual
signature count which are introduced in Chapter 3 (pp. 37). Textual signature
count and conceptual signature count per submission (TDC/Sub and CDC%) are
additionally used to record defect counts.
Definition: Textual Defect Count /Submission (TDC/Sub) measures the textual
signature count per submission of a given cohort;
Definition: Conceptual Defect Count /Submission (CDC %) measures the
percentage of subjects who made at least one error in a given cohort.
Additionally, a counting rule is set up for NDT users to record the count for
each defect. It is ruled that when the defect content identified match more than
one defect class, only one class with higher level priority is selected. In NDT, a
higher level priority is given to the defect class at a lower depth. For example, a
two-depth defect has a higher level than a three-depth defect in NDT. Within the
same depth, a defect class with a lower taxonomy code has a higher priority. “For
example, consider a method that requires an array data structure, but instead a
student declares separate variables to store each item. In this study, such a fault
77
would be classified in the sub-class LOGIC in the PLAN category rather than the
sub-class MALFORMED UPDATE in COMPUTATION because the LOGIC defect
class has a higher level.
This chapter evaluates Novice Defect Taxonomy (NDT) by using it to analyze a
large number of student programming assignments. The data is used to answer
eight research questions proposed in Section 1.4.
5.1 Comparison of NDT Defect Categories with
Other Defect Taxonomies
In this section we use data collected from qualitative analysis to answer research
questions: what types of defects are identified from student submissions (Section
1.4.1)? and how are these types related to existing defect taxonomies (Section
1.4.2)?
One reason for assessing a large number of student programs is to discover rare
defect types that might not have been addressed or classified by previous
taxonomies. In this section, we compare NDT categories with other defect
taxonomies and discuss their similarities and differences. From the qualitative
analysis, it is found that sub-classes of FUNCTIONAL DEFECT and
EVOLVABILITY DEFECT are matched well with existing studies. Some new NDT
categories found in this empirical work belong to the class INCOMPLETE CODE.
A taxonomy of defect INCOMPLETE CODE is created on a basis of previous
studies as well as empirical findings. Defect INCOMPLETE CODE has three
sub-classes: D1.1.1 NO SUBMISSION, D1.1.2 UNRECOGNIZED FILE and
D1.1.3 INCOMPLETE METHOD to address the problems of no submission or
incorrect submission, and incomplete functions found in programs. Defects NO
SUBMISSION and UNRECOGNIZED FILE have been also addressed by previous
studies. Both Ahmadzadeh, Elliman & Higgins (2005) and Coull et al. (2003)
presented a defect type named “submission not found” to identify the no
submission problem. Jackson, Cobb & Carver (2004) addressed the file with
unrecognizable format or incorrect names and named this defect “files with
improper names”. Sub-class INCOMPLETE METHOD has been never identified
78
by existing taxonomies.
The sub-classes of SYNTAX ERRORS are superset of previous classifications.
The defect class TYPE MISMATCHED has been also identified by Ahmadzadeh,
Elliman & Higgins (2005). The defect class MISMATCHED BRACE OR
PARENTHESIS has been identified by several previous studies, for example,
Coull et al. (2003), Hristova et al. (2003) and Jackson, Cobb & Carver (2004).
These two defect categories are the most frequent syntax errors in quantitative
analysis. All other syntax defects with lower occurrence rate are classified into
defect class OTHER SYNTAX ERRORS.
Sub-classes of FUNCTIONAL DEFECT agree with findings from previous
work. FUNCTIONAL DEFECT is classified into four sub-classes: PLAN,
INITIALIZATION, CHECK and COMPUTATION. These categories are derived
both from our analysis and from subsets of previous classifications. Some
functional categories not relevant for novice defect analysis are excluded in this
study. For example, defects relating to software interfaces are proposed by
Chillarege et al. (1992) and Mantyla & Lassenius (2009). But in this analysis, it is
observed no specific defects relevant to interface from assignment analysis, and so
there is no interface category in NDT. The discovery of defect PLAN conforms to
the plan defect proposed by Siy & Votta (2001) which refers to a large number of
statements improperly implemented. These defects may require large scale of
modification in software. Basili & Selby‟s (1987) classification contains a class
Initialization but makes no distinction between the omission and commission
cases. It is determined to be an INITIALIZATION defect due to the correct
statements not present or resource initialized incorrectly. In NDT, further
sub-classes D2.1.2.1 ASSIGNMENT MISSING and D2.1.2.2 ASSIGNMENT
MALFORMED are proposed to distinguish these two possibilities. Class CHECK
referring to improper instance assurance or improper method guard has been
detected by Chillarege et al. (1992) and Mantyla & Lassenius (2009). The type
D2.1.4 COMPUTATION indicates defects made in the fourth category
COMPUTATION.
Prior studies emphasize that “the majority of code findings are evolvability
defects” (Siy & Votta 2001): defects that affect how easy to understand, correct
and maintain the code in a long term. Our findings about evolvability classes are
similar to those in other taxonomies. Sub-classes of EVOLVABILITY DEFECT
79
(D2.2) are subset of previous findings. The main defect types DOCUMENTATION
(D2.2.1) and STRUCTURE (D2.2.2) are taken from Mantyla & Lanssenius (2009).
Sub-classes NAMING, COMMENTS and CODING are named based on
Checkstyle rule sets. Subclasses SIZE VIOLATION, COMPLEX CODE and
UNUSED CODE of defect STRUCTURE confirmed by Mantyla & Lassenius
(2009) are also named by the rule sets of Checkstyle and PMD.
5.2 Quantitative Analysis of Defects
There are discrepancies between the most frequent faults that instructors believed
student are making and the faults that students are aware of or are making
(Ala-Muta et al 2005). This section addresses the research questions: what are the
most common defects made by novices (Section 1.4.3)? Are these categories
consistent with previous work (Section 1.4.4)?
When counting defects in programs, a decision is probably needed to count the
total numbers of defects, the numbers of assignments contained defects or to
count both. As defined in Section 3.3.2, counting of defect category textually
results in multiple counts if the same fault occurred many times in one submission.
Conceptual counts are used to measure how many assignments contained the
defect. Both defect types and frequencies are tracked to create an overall list
covering the most frequent defect types.
Another concern of this research is what kinds of mistakes novices are prone to
make. Previous studies have identified the most common errors by conducting
surveys (Flowers, Carver & Jackson 2004; Jackson, Cobb & Carver 2004) or by
counting and classifying common faults identified from assessing student
assignments (Chabert & Higginbotham 1976; Kopec, Yarmish & Cheung 2007).
The textual defect counts of all assignments we analyzed are shown in Table 5.1.
That is, data are collected from 1271 submissions completed by four cohorts. The
top ten defects are ranked by Conceptual Defect Count/ Submission (CDC %)
which shows a percentage of invalid submissions. The defect classes in Table 5.1
are taken from bottom-level categories of NDT.
80
Defect Type TDC CDC %
1 D2.2.1.2 COMMENTS 1790 29.11%
2 D2.1.3.1 CHECK-VARIABLE 1022 20.27%
3 D2.2.1.1 NAMING 604 16.97%
4 D2.2.2.2 COMPLEX CODE 748 16.81%
5 D2.2.2.1 SIZE VIOLATION 590 14.30%
6 D2.2.2.3 UNUSED CODE 405 13.38%
7 D2.2.1.3 CODING 339 13.20%
8 D2.1.4.2 MALFORMED UPDATE 268 10.38%
9 D1.1.1 NO SUBMISSION 179 7.81%
10 D1.2.1 MISSING UPDATE 82 3.23%
Table 5. 1. Top Ten Defects from the UWA Data Set
The most common defect (Table 5.1) reveals students not seeing the importance
of writing proper comments in programs. In the exercises selected for the defect
analysis, a code skeleton with some global fields written by instructors is provided
to guide students. It is hoped students would read and understand the purpose of
each method and then write and comment their code. It is observed that quite few
students have completed the tags and generated meaningful comments on these
fields. Class NO SUBMISSION staying in a rate of 7.81% shows many failing
submissions often found in novice submissions. This finding is confirmed by the
observation from Robins (2010). In Table 5.1, note that the majority of top defects
belong to EVOLVABILITY DEFECT and to CODE COMPLETENESS or
FUNCTIONAL DEFECT. It is noticeable that bottom-category COMPLEX CODE,
SIZE VIOLATION and UNUSED CODE which are all types of STRUCTURE
defects. One reason is that novices may lack the knowledge and skills of program
planning (Lister & Leaney 2003). They only focus on a small part of the overall
structure and thereby design and generate their programs line by line rather than
the large structure of the whole class (Ala-Mutka 2005; Soloway & Spohrer
1989a).
5.3 Defect Patterns and the Difficulty of Exercises
In this section we derive defect patterns to answer the research question: what do
cohort defect patterns tell us about programming exercises (Section 1.4.5)?
Students enrolled by the unit of Java Programming and by the unit of Software
Engineering complete a series of laboratory exercises during a teaching semester.
Each task is designed to evaluate students‟ ability to apply the techniques they
81
have learnt. In NDT, the sub-class FUNCTION MISSING measures incomplete
methods in submissions. Students may produce functional defect FUNCTION
MISSING when they are unable to complete the function or they can‟t even start
coding the function
Complexity
Level
Cohort
(Submission
No.)
Defect Type
D2.1.1.1 FUNCTION MISSING
TDC TDC/Sub CDC CDC% TDC/CDC
Level 1 (Fast)
A1 (N=94) 0 0 0 0% 0
B1 (N=184) 2 0.011 2 1.1% 1
D1 (N=200) 4 0.02 3 1.5% 1.3
C1 (N=75) 6 0.08 3 4% 2 Level 2
(Intermediate) B2 (N=184) 9 0.049 5 2.7% 1 C2 (N=75) 6 0.08 5 6.7% 1.2 D2 (N=200) 33 0.165 20 10% 1.65
Level 3 (Hard) B3 (N=184) 30 0.163 24 13% 1.25
Table 5. 2. FUNCTION MISSING (D2.1.1.1) Defects
for Labs at Different Complexity Levels
Table 5.2 shows the defect counts of defect FUNCTION MISSING measured
from nine exercises completed by four cohorts. Assessed exercises are classified
into three levels based on their assigned complexity level (Table 3.3). Class
FUNCTION MISSING exposes students who are unable to complete lab exercises.
One reason is students cannot complete the assignment before submitting their
code is due to the time limit. Another reason may be that students are limited to
surface knowledge and are unable to apply programming concepts in practice
To obtain a fair measure of students who are struggling with this defect category,
the submission numbers (instead of the number of enrolled students) are shown.
The count of CDC% starts from zero (A1) and climbs up to roughly 10% for more
complicated exercises (D2). Lab B2 and D2 involve the same structure in
programming. Therefore, they are classified into the same level, but three more
complicated methods are added in B2. The student in Cohort D refers to the
beginners who had no typical previous Java programming experience. The trend
in defect type FUNCTION MISSING is increasing in later labs. This indicates that
almost all students are able to complete functions as required in low-level tasks
and many students are unable to complete all functions (usually last one or two
functions) when they face more complicated labs.
In Table 5.2, the high frequency of D2 in intermediate level and B3 in hard
82
level in method incompletion rate suggests that the task requirement of these two
labs may be too hard for students, as approximately 10% and 13% of students are
unable to complete B3. Given the suggestions from Lister & Leaney (2003)‟s
taxonomy, we may perform one solution that provides weak students with an
easier version of laboratory task. By performing a code inspection on these
submissions, it is noted that although both the textual and conceptual counts of
defect FUNCTION MISSING increases for more complicated exercises, there is
little difference between the average numbers of defect per assessed submission
(TDC/CDC) in assessed exercises. The data of TDC/DCD slightly fluctuated
between 1 and 2 indicate that students are able to complete the first few methods
in a code skeleton but may struggle with the last one or two functions. This
suggests instructors could simplify the last few methods or to make the methods
as optional for weak students. As suggested by Lister & Leaney (2003), using an
easier or shortened version of the exercises may better fit the needs of weak
students.
The results in Table 5.2 also suggest that providing test cases to students does
not help them in function completing: 10% students in D2 fail to complete their
tasks although they are given test cases but only 2.7% in B2 have the same
problem without any automatic aids. One reason might that although some
students received feedback generated by supporting tool they are still unable to fix
these problems by themselves. It is noted that students with incomplete code
account for only a small proportion of the whole cohort. Further analysis of defect
patterns would enhance our understanding of students in exercise completion.
5.4 Using NDT to Analyze the Impact of Automatic
Feedback
In this section, we answer the research questions: how does the provision of
formative feedback with programming support tools affect the defect rates in
submitted assignments (Section 1.4.6)? This research question addresses whether
automatic feedback reduce the defect rate of submissions. To answer this question,
we will compare defect counts of the cohort with access to automatic tools with
that of the cohort without any aids.
83
Table 5.3 presents both textual and conceptual counts addressed from all nine
exercises (1271 submissions in total). We count novices‟ defects on the
second-level defect classes: INCOMPLETE CODE, SYNTAX ERROR,
FUNCTIONAL DEFECT and EVOLVABILITY DEFECT. The largest counts
belong to defect class FUNCTIONAL DEFECT and EVOLVABILITY DEFECT,
accounting for 4450 and 1674 TDCs, respectively. INCOMPLETE CODE is the
third defect class accounting for approximately 271 submissions. These
submissions have a few syntactic defects (3.8% of total submission). In the
following, we will perform analysis of labs with and without automatic assistance
to discuss the impact of support tools on defect reduction by counting
FUNCTIONAL DEFECT and EVOLVABILITY DEFECT of these labs.
Main Group TDC TDC/Sub CDC CDC%
INCOMPLETE CODE 271 0.213 227 17.9%
SYNTAX ERROR 65 0.051 48 3.8%
FUNCTIONAL DEFECT 1674 1.317 414 32.6%
EVOLVABILITY DEFECT 4450 3.501 436 34.3%
Table 5. 3. Distribution of Novice Defects
Functional Defect Analysis
Test cases and JUnit integrated with Eclipse IDE are used by Cohort D while
cohort B is given JUnit test cases prepared by the course instructor only. Both of
these two cohorts complete four lab assignments and one project in one academic
semester. To analyze the impact of automatic aids on reducing functional defects
in assignments, the functional defect data of Labs B1 and D1 are presented in
Table 5.4 and 5.5. The defect data was collected from 384 submissions (184
submissions completed by Cohort B and 200 submissions by Cohort D). The main
defect class FUNCTION DEFECT has four sub-classes: PLAN, INITIALIZATION,
CHECK and COMPUTATION. Many submissions contained functional faults:
615 faults in 184 submissions and an average of 3.342 defects per submission
(Table 5.4). The right two columns of Table 5.4 show the outcomes of Cohort D
using instructor-provided test cases and the dynamic testing tool (JUnit) to test the
functional correctness of their programs. The total textual warnings reduce to 70
and the average falls to 0.35 defects per submission.
84
A summary of conceptual counts of functional defects is shown in Table 5.5.
Conceptual percentage of defect CHECK is 90.8% in Lab B1 and this percentage
decreases to 6% in Lab D1. The decrease in the defect rate in Table 5.5 supports
the observation that predefined test cases executed by automatic tools emphasized
on explicit side conditions evoked student awareness of defects in conditional
fields. The results from Table 5.5 show automatic tools and formative feedback
enable an easier removal of functional defects in programs.
FUNCTIONAL
DEFECT
Lab B1 (Without
Support Tool) (N=184) Lab D1 (With Support
Tool) (N=200)
TDC TDC/Sub TDC TDC/Sub
PLAN 16 0.087 5 0.025
INITIALIZATION 33 0.179 28 0.14
CHECK 467 2.538 20 0.1
COMPUTATION 99 0.538 17 0.085
Total 615 3.342 70 0.35
Table 5. 4. Error Information (TDC) of Lab B1 and D1on the
Basis of Sub-classes of FUNCTIONAL DEFECT
FUNCTIONAL
DEFECT
Lab B1 (Without
Support Tool) (N=184) Lab D1 (With
Support Tool) (N=200) CDC CDC% CDC CDC%
PLAN 8 4.3% 3 1.5%
INITIALIZATION 9 4.9% 10 5%
CHECK 157 90.8% 12 6%
COMPUTATION 33 17.9% 12 6%
Total 207 90.8% 37 7.5%
Table 5. 5. Error Information (CDC) of Lab B1 and D1 on the
Basis of Sub-classes of FUNCTIONAL DEFECT
The major change shown in Table 5.5 is the TDC data in the defect class
CHECK-VARIABLE: from 465 observed in Lab B1 to 19 in Lab D1. This finding
can be explained as students may not identify illegal parameter values unless these
are explicitly specified. However, such defects can easily be fixed once students
receive error messages from executing the given JUnit tests. There is little
difference between count of the INITIALIZATION defect of B1 and D1. This can
be explained as having a thorough understanding of field initialization challenges
students. Another concern is students may not understand the intent of an error
message and thereby unable to fix the defects without external help.
Both conceptual and textual counts of defects from the group of students with
tool support are lower than the defect count from the group without lab support
except for the INITIALIZATION defect. This finding supports that the majority of
85
students become effective users who are able to identify and remove functional
faults with the given automatic feedback.
FUNCTIONAL
DEFECT
Lab B1 (Without
Support Tool) (N=184) Lab D1 (With Support
Tool) (N=200) TDC TDC/Sub TDC TDC/Sub
FUNCTION MISSING 10 0.054 4 0.02
LOGIC 6 0.033 1 0.005
ASSIGNMENT MISSING 22 0.12 19 0.095
ASSIGNMENT
MALFORMED
11 0.060 9 0.045
CHECK-VARIABLE 465 2.527 19 0.095
CHECK-EXPRESSION 2 0.011 1 0.005
CHECK-FUNCTION 0 0 0 0
MISSING UPDATE 33 0.179 0 0
MALFORMED UPDATE 66 0.359 17 0.085
Total 615 3.342 70 0.35
Table 5. 6. Error Information (TDC) of Lab B1 and D1 on the
Basis of Bottom Level Classes of FUNCTIONAL DEFECT
FUNCTIONAL
DEFECT
Lab B1 (Without
Support Tool) (N=184) Lab D1 (With Support
Tool) (N=200) CDC CDC% CDC CDC%
FUNCTION MISSING 8 1.2% 3 1.5%
LOGIC 5 0.6% 1 0.5%
ASSIGNMENT MISSING 9 1.7% 10 5%
ASSIGNMENT
MALFORMED
6 2.3% 4 2%
CHECK-VARIABLE 157 90.8% 12 6%
CHECK-EXPRESSION 1 0.6% 1 0.5%
CHECK-FUNCTION 0 0% 0 0%
MISSING UPDATE 15 0% 0 0%
MALFORMED UPDATE 33 22% 12 6%
Total 167 90.8% 15 7.5%
Table 5. 7. Error Information (CDC) of Lab B1 and D1 on the
Basis of Bottom Level Classes of FUNCTIONAL DEFECT
Evolvability Defect Analysis
Similar to the previous functional analysis, Lab B1 and D1 are selected for
evolvability defect analysis. Cohort D is provided with static analysis tools
(Checkstyle and PMD) as well as some configured programming rules. Table 5.8
shows the textual counts of sub-classes of EVOLVABILITY DEFECT. The most
significant results belong to the sub-classes DOCUMENTATION with 275 TDCs
and 1.495 defects per submission for Lab B1. The data is reduced to 3 TDCs and
0.015 defects per submission for Lab D1. With automatic tools, students are able
to avoid making the same faults when a high occurrence observed of that error.
Table 5.9 shows the conceptual defect count (CDC). Again the results suggest that
students become effective users of static analysis tools for detecting the
86
evolvability defects and then quickly remove these faults from their assignments.
EVOLVABILITY
DEFECT
Lab B1 (Without
Support Tool) (N=184) Lab D1 (With Support
Tool) (N=200) TDC TDC/Sub TDC TDC/Sub
DOCUMENTATION 275 1.495 3 0.015
STRUCTURE 3 0.016 6 0.03
Total 278 1.511 9 0.045
Table 5. 8. Error Information (TDC) of Lab B1 and D1 on the
Basis of Sub-classes of EVOLVABILITY DEFECT
EVOLVABILITY
DEFECT
Lab B1 (Without
Support Tool) (N=184)
Lab D1 (With
Support Tool)
(N=200) CDC CDC% CDC CDC%
DOCUMENTATION 32 17.4% 2 1%
STRUCTURE 1 0.5% 2 1%
Total 33 17.9% 4 2%
Table 5. 9. Error Information (CDC) of Lab B1 and D1 on the
Basis of Sub-classes of EVOLVABILITY DEFECT
Defect counts of the bottom classes of EVOLVABILITY DEFECT’s taxonomy
for Labs B1 and D1 are presented in Table 5.10 and Table 5.11. Both conceptual
and textual defect counts of COMMENTS have been reduced. By using PMD to
validate the styles of programs, almost all students in cohort D submit defect-free
assignments. A spike of COMMENTS defects is detected in Lab B1 because
students are not given a code skeleton with code signature.
However, some defect counts do not conform to the trend as expected. It is
surprising, for example, that supporting tools do not reduce ASSIGNMENT
MALFORMED and ASSIGNMENT MISSING defects. Instead, the defect
ASSIGNMENT MALFORMED is only slightly reduced for Lab D1. It indicates
that students are unable to fix faults in constructors and in field declaration from
automatically generated feedback.
87
EVOLVABILITY
DEFECT
Lab B1 (Without
Support Tool) (N=184) Lab D1 (With Support
Tool) (N=200)
TDC TDC/Sub TDC TDC/Sub
NAMING 5 0.027 2 0.01
COMMENTS 268 1.457 0 0
CODING 2 0.011 1 0.005
SIZE VIOLATION 1 0.005 1 0.005
COMPLEX CODE 0 0 1 0.005
UNUSED CODE 2 0.011 4 0.02
Total 278 1.511 9 0.045
Table 5. 10. Error Information (TDC) of Lab B1 and D1 on the Basis
of Bottom Level Defect Types of EVOLVABILITY DEFECT
EVOLVABILITY
DEFECT
Lab B1 (Without
Support Tool) (N=184) Lab D1 (With Support
Tool) (N=200)
CDC CDC% CDC CDC%
NAMING 2 1.1% 2 1%
COMMENTS 32 17.4% 0 0%
CODING 2 1.1% 1 0.5%
SIZE VIOLATION 1 0.5% 1 0.5%
COMPLEX CODE 0 0% 1 0.5%
UNUSED CODE 1 0.5% 2 2%
Total 33 17.9% 4 2%
Table 5. 11. Error Information (CDC) of Lab B1 and D1 on the Basis
of Bottom Level Defect Types of EVOLVABILITY DEFECT
Students‟ code is expected to have lower defect rates when automatic tools are
offered in laboratory. Results gained from a NDC analysis of Lab D1 (functional
defect proportion below 7.5% and evolvability defect proportion below 2%)
support our hypothesis that support tools have a positive impact on reducing the
defect rates in novices‟ programs. Most students are able to debug and fix defects
automatic tools revealed. Students appear to become effective users of both
dynamical tools (JUnit) and static tools (Checkstyle &PMD) to identify and
remove faults.
5.5 Using NDT to Improve Teaching Strategy
This section addresses the exploratory question: which sorts of problems can be
reduced (Section 1.4.7)? and what is the related strategy in programming teaching
(Section 1.4.8)? We first focus on the defect categories that contribute to a large
proportion of total defect counts and then identify some conceptual problems that
underlie these defects. Subsequently, we discuss some strategies that might help to
reduce the defects in submissions.
88
Firstly, learning strategies and solutions derived from using NDT on analyzing
submissions are summarized in Table 5.12.
Question
What kinds of programming problems do novice encounter? 1. Do not attempt
2. Understanding programming requirement
3. Understanding language syntax
4. Understanding programming structures
5. Dividing functionality into procedures
6. Understanding class declaration and constructor
7. Using guard to validate range of input data
8. Understanding code documentation
P1
P2
P3
P4
P5
P6
P7
P8
What kinds of solutions may help novices in programming? 1. Easier Version of Exercise
2. Creative Response (Feedback)
3. Tool Assisted Instruction (Code Signature)
4. Tool Assisted Instruction (Documentation Configured File)
5. Tool Assisted Instruction (Test Cases)
S1
S2
S3
S4
S5
Table 5. 12. Common Programming Problems, Teaching Strategies and Solutions
Table 5.12 lists 8 problems students encountered. These problems are derived
from lab observation and from submission assessment. Problems P3, P4 &P5 have
also been observed by Lahtinen, Ala-Mutka & Jarvinen (2005). Teaching solutions
S1-S6 in Table 5.12 are proposed to address these problems in programming.
Table 5.13 lists the most common defects identified by using NDT. For each
defect, a possible teaching solution is available to suggest the improvement of
teaching strategies. The problem of don‟t attempt (P1) and understanding
programming requirements (P2) are seen in approximately 10% of students who
have incomplete submissions and roughly 7% who submit no programs over all
exercises analyzed. Understanding language syntax (P3) challenges a small
proportion of novices accounted for 3.8% of overall students. Previous studies
detect many syntax errors because they focus on errors made in work in progress.
However, we focus on the final assignments. The small scale of syntax defects
observed from this study can be explained as many syntax errors have been fixed
during the programming process before formal submissions. The problem of
understanding program structure (P4) results from students failing to use proper
structure to simplify their programs. From static analysis, 16.81% of submissions
violate size limitations and 14.30% have complex structures. Using guards to
89
validate range of input data (P7) and Understanding code documentation (P8)
challenges many students. In future work we plan to investigate further how in-lab
feedback helps students to overcome programming problems.
Problems Defect Class Defect Distribution Solutions
P1 D1.1.1 NO SUBMISSION 7.81% S1
P2 D1.1.2 UNRECOGNIZED FILE 4.3% S2 D1.1.3 INCOMPLETE CODE 17.9%
P3 D1.2 SYNTAX ERROR 3.8% S2
P4 D2.2.2.1 SIZE VIOLATION 14.30% S4, S2 D2.2.2.2 COMPLEX CODE 16.81%
P5 D2.1.1 PLAN 3.21% S3,S2,S6
P6 D2.1.2 INITIALIZATION 5.3% S2
P7 D2.1.3 CHECK 25.8% S5, S2 D2.1.4 COMPUTATION 9.86%
P8 D2.2.1.1 NAMING 16.97% S3, S2 D2.2.1.2 COMMENTS 29.11%
D2.2.2.3 UNUSED CODE 13.38%
Table 5. 13. Top Defect Class, Novice Problems
Underlying & Solution in Teaching Strategies
There are several teaching strategies are available to assist students in
completing laboratory assignments. It is well known that novices are struggling
with the beginning to learn program (P1) and may be unable to upload their
programs on time (D1.1.1 NO SUBMISSION). For students who are trapped in
learning to program, an easier version of an assignment is suggested to be
offered (S1). Subsequently, P2 detected by the defect (D1.1.2 UNRECOGNIZED
FILE) arises from an unawareness of assignment requirements. To solve this, a
creative response (S2) warning students of their unawareness is a possible
solution. In the exercises we assessed, students are always given a Java class
signature (S3) to help them write well-structured programs. To solve the problem
of submissions involved high complexity construct (P4), novices are given
warnings and suggestions of ways to simplify their solutions (S4). Finally,
self-assessment is a significant skill needed by programmers to debug their code.
In this study, the evaluation of functional correctness is reliable to the quality of
test cases given (S5). Test cases with good coverage help novices avoid making
logic or semantic errors. The feedback of failed JUnit tests guides students with
how to correct their problems before submitting their assignments.
90
5.6 Summary
In this chapter we evaluated the Novice Defect Taxonomy (NDT) by using it to
analyse a large dataset of student programming assignments. Then, the defect data
is used to answer five research questions introduced in Chapter 1. We have
compared our defect categories with previous taxonomies (Section 5.1), presented
a list of top defects (Section 5.2), derived defect patterns to show students‟
programming challenges (Section 5.3), evidence about the effect of in-lab
feedback on the defect rates in student assignments (Section 5.4), and proposed
some teaching strategies to address problems that students face (Section 5.5).
91
Chapter 6 Conclusion
6.1 Contribution
In this dissertation, we establish a new defect taxonomy by analyzing large sample
of students‟ assignments. This defect taxonomy reveals the main program defects
that occur in students‟ assignments and provides an evidence to improve the
current teaching curriculum. The main contributions of this dissertation are:
Establishing a descriptive taxonomy of defects for classifying the defects in
students‟ programs;
Describing a defect detection mechanism that uses automated and manual
approaches to detect defects;
Establishing a common defect list. This list contains four main defect
types, included code completion defects, code compiled defects, functional de
fects and evolvability defects;
Describing the main defect patterns that troubled students;
Presenting suggestions to improve teaching curriculum in introductory
programming course according to the defect detection methods used in this
experiment
6.2 Future Work
Future research extends to the following areas:
First, we can perform NDT on assignments completed by students in
intermediate and advanced levels. New defect types can be added to NDT. So
NDT in the future can fit for the analysis of defects made by students in different
levels rather than beginners only.
Subsequently, we provide dynamic and static tools to students in their practical
92
session. The results show that support tools have a positive impact on reducing
defect rates in novices‟ programs. In the future, we change support tools in
laboratory and observe the defect rates in programs. We hope to discover the most
effective method on ways to reduce defects in assignments.
Next, conducting a survey or an interview on students benefits the improvement
of laboratory setting and grading system. In this experiment, we only analyze
the defect types and their distributions in final submissions rather than defects
occurred in programming process. Ko & Myer (2005) conducted a
questionnaire survey on students to investigate their programming difficulties. For
example, the survey focuses on when students make defects in programming and
how long will these defects be removed. Our investigation in the next stage can
use the form of a questionnaire or an interview to investigate the causes and
removals of defects.
Finally, the analysis can be extended to defects in programs written by other
programming languages. We have identified defects in Java programs only in this
study. NDT can be used as a tool to address defects in other
languages. Through the further analysis, NDT is expected to become a taxonomy
that contained defects in many different programming languages.
6.3 Conclusion
This work presents a new defect taxonomy (NDT) developed to classify defects in
students‟ assignments. By using NDT to analyse students‟ code we get a list
of defects of the types: completeness defect, compiler defect, functional defect
and style defect. By knowing what challenged students most, instructors may
improve their teaching by placing greater emphasis on the areas where students
are struggling. From quantitative data, the defect categories observed with high
frequency show the difficulties that challenged many students. The classes of
defect indicating that no work has been done would tell instructors that the
exercises may be too complicated and thereby inform them to prefer an alternative
solution of, giving an easier version of laboratory task.
By using NDT to assess student submissions, it is noted that some defect
categories low frequencies. But it is still valuable to keep these categories in NDT
rather than remove them because the frequencies of these defects may fluctuate
93
when different exercises with different constructs are involved.
Additionally, through the experimental data we found that, the most
common defects belong to the categories of code style defects and functional
defects. Fortunately, automated testing tools and timely feedback have a
positive impact on reducing these two types.
94
Bibliography
Ahmadzadeh, M, Elliman, D & Higgins, C 2005, „An Analysis of Patterns of
Debugging Among Novice Computer Science Students‟, Proceedings of the 10th
Annual SIGCSE Conference on Innovation and Technology in Computer Science
Education, New York, USA, pp. 84-88.
Ala-Mutka, K 2004, „Problems in Learning and Teaching Programming- A
Literature Study for Developing Visualizations in the Codewitz-Minerva Project,‟
Codewitz Needs Analysis, Institute of Software Systems, Tampere University of
Technology.
Ala-Mutka, K 2005, „A Survey of Automated Assessment Approaches for
Programming Assignments,‟ Computer Science Education, vol. 15, no. 2, pp.
83-102.
Allwood, C.M. 1990, „Novices‟ Debugging When Programming in Pascal,‟
International Journal of Man Machine Studies, vol. 33, no. 6, pp. 707-724.
Basili, VR & Perricone, BT 1984, „Software Errors and Complexity: An
Empirical Investigation,‟ Communications of the ACM, vol. 27, no. 1, pp. 42-52.
Basili, VR & Selby, RW 1987, „Comparing the Effectiveness of Software
Testing Strategies‟, IEEE Transactions on Software Engineering, vol. 13, no. 12,
pp. 1278-1296.
Cardell-Oliver, R, Zhang, L, Barady, R, Lim, YH, Naveed, A & Woodings, T
2010, „Automated Feedback for Quality Assurance in Software Engineering
Education,‟ Proceeding of the 2010 21st Australian Software Engineering
Conference, pp. 157-164.
Chabert, J & Higginbotham, T 1976, „An Investigation of Novice Programmer
Errors in IBM 370 (OS) Assembly Language,‟ ACM-SE’14 Proceedings of the
14th
Annual Southeast Regional Conference, New York, NY, USA, pp. 319-323.
Checkstyle 2001, Checkstyle, plug-in for Eclipse Version 5.0.0.200906281855
-final. Available from: http://checkstyle.sourceforge.net. [26 June 2009].
Chillarege, R, Bhandari, IS, Chaar, JK, Halliday, MJ, Moebus, DS, Ray, BK &
Wong, M-Y 1992, „Orthogonal Defect Classification- A Concept for In-Process
Measurements,‟ IEEE Transactions on Software Engineering, vol. 18, no. 11, pp.
95
943-956.
Chillarege, R, Kao, W-L & Condit, RG 1991, „Defect Type and its Impact on
the Growth Curve,‟ Proceedings of the 13th International Conference of Software
Engineering, Austin, TX, USA, pp. 246-255.
Coull, N, Duncan, I, Archibald, J & Lund, G 2003, „Helping Novice
Programmers Interpret Compiler Error Message,‟ Proceedings of the 4th Annual
LTSN-ICS Conference, National University of Ireland, Galway, pp. 26-28.
Detienne, F 1990, „Expert Programming Knowledge: A Scheme-based
Approach‟, in Psychology of Programming, eds J-M HOC, TRG Green & R
Gilmore, Academic Press, People and Computer Series, pp. 205-222.
Eclipse, n.d., Eclipse Fundation Open Source Community. Available from:
http://www.eclipse.org/. [12 December 2008].
Edwards, S 2004, „Improving Student Performance by Evaluating How Well
Students Test Their Own Programs. ACM Journal of Educational Resources in
Computing, vol. 3, no. 3, pp. 1-24.
Endres, A 1975, „An Analysis of Errors and Their Causes in System Programs,‟
Proceedings of the International Conference on Reliable Software, Los Angeles,
California, USA, pp. 327-336.
Findbugs, n.d., Find Bugs in Java Program. Available from:
http://findbugs.sourceforge.net. [26 June 2010].
Florac, WA 1992, Software Quality Measurement: a Framework for Counting
Problems and Defects, CMU/SEI-92-TR-022, Software Engineering Institute,
Carnegie Mellon University, Pittsburgh, Pennsylvania.
Flowers, T, Carver, CA & Jackson, J 2004, „Empowering Students and Building
Confidence in Novice‟, The 34th ASEE/IEEE Frontiers in Education Conference,
vol. 1, pp. T3H/10- T3H/13.
Gugerty, L & Olson, G 1986, „Debugging by Skilled and Novice Programmers,‟
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,
New York, USA, pp. 171-174.
Hristova, M, Misra, A, Rutter, M & Mercuri, R 2003, „Identifying and
Correcting Java Programming Errors for Introductory Computer Science
Students,‟ ACM SIGCSE Bulletin, vol. 35, no. 1, pp. 153-156.
IEEE 2010, IEEE Standard Classification for Software Anomalies, IEEE Std.
1044-2009.
96
ISO/IEC 2009, Systems and Software Engineering- Vocabulary, ISO/IEC
24765-2009.
Jackson, D 2002, „A Semi-automated Automated Approach to on-line
Assessment‟, in Proceedings of the 5th Annual SIGCSE Conference on Innovation
and Technology in Computer Science Education, pp. 164-167.
Jackson, J, Cobb, M & Carver, C 2004, „Identifying Top Java Errors for Novice
Programmers‟, paper presented to the 35th ASEE/IEEE Frontiers in Education
Conference pp. T4C-24-T4C-27.
Java Programming (CITS1200) n.d., School of Computer Science and Software
Engineering, Available from:
<http://undergraduate.csse.uwa.edu.au/units/CITS1200>, [24 March 2009].
JUnit4, n.d., JUnit4, Resources for Test Driven Development. Available from:
http://www.junit.org. [04 March 2009]
Kaner, C 2003, „What is a Good Test Case?‟ paper presented on the STAR East
2003 Conference, Orlando, pp. 1-16.
Kessler, CM & Anderson, JR 1989, „Learning Flow of Control: Recursion and
Iterative Procedures,‟ Human Computer Interaction, vol. 2, no. 2, pp. 135-166.
Ko, AJ & Myers, BA 2005, „A Framework and Methodology for Studying the
Causes of Software Errors in Programming Systems,‟ Journal of Visual
Languages and Computing, vol. 16, pp. 41-84.
Kolling, M 1999, BlueJ-The Interactive Java Environment. Available from:
http://www.bluej.org/. [14 May 2009]
Kopec, D, Yarmish, G & Cheung, P 2007, „A Description and Study of
Intermediate Student Programmer Errors,‟ Computer Human Error, vol. 39, no. 2,
pp. 146-156.
Lahtinen, E, Ala-Mutka, K & Jarvinen, H-M 2005, „A Study of the Difficulties
of Novice Programmers,‟ ACM SIGC SE Bulletin, vol. 37, no. 3, pp. 14-18.
Lister, R & Leaney, J 2003, „First Year Programming: Let All the Flowers
Bloom‟, Proceeding of the 5th Australasian Conference on Computing Education,
vol. 20, pp. 221-230.
Mantyla, M & Lassenius, C 2009, „What Types of Defects Are Really
Discovered in Code Reviews?‟, IEEE Transactions on Software Engineering, vol.
35, no. 3, pp. 430-448.
McDonald, C 2009, Computer Science and Software Engineering Web-based
97
Software Supporting Teaching. Available from:
https://secure.csse.uwa.edu.au/chris/portfolio/web-software.html. [27 November
2009]
Musa, JD, Iannino, A & Okumoto, K 1987, Software Reliability-Measurement,
Prediction, Application, New York, McGraw-Hill.
Pea, R 1986, „Language-Independent Conceptual Bugs in Novice
Programming,‟ Journal of Educational Computing Research, vol. 2, pp. 25-36.
PMD 2002, PMD plug-in for Eclipse version 3.2.6. v200903300643. Available
from: http://pmd.sourceforge.net/eclipse. [14 July 2009].
Robins, A 2010, „Learning Edge Momentum: A New Account of Outcomes in
CS1,‟ Journal of Computer Science Education, vol. 20, no. 1, pp. 37-71.
Robins, A, Haden, P & Garner, S 2006. „Problem Distributions in a
CS1 Course‟. In Proceeding of the 8th
Australasian Computing Education
Conference, ed. D. Tolhurst, S. Mann, Hobart, Australia, pp. 165-173.
Robins, A, Rountree, J & Rountree, N 2003, „Learning and Teaching
Programming: A Review and Discussion,‟ Computer Science Education, vol. 13,
no. 2, pp. 137-172.
Siy, H & Votta, L 2001, „Does the Modern Code Inspection Have Value?‟
Proceedings of International Conference of Software Maintenance, pp. 281-289.
Soloway, E & Spohrer, J 1989a, „Novice Mistakes: Are the Folk Wisdoms
Correct?‟ in Studying the Novice Programmers, eds E Soloway & J Spohrer,
Lawrence Erlbaum Associates, pp. 401-418.
Soloway, E & Spohrer, J 1989b, Studying the Novice Programmer. Hillsdale,
New Jersey, Lawrence Erlbaum Associates.
Truong, N, Roe, P & Bancroft, P 2004, „Static Analysis of Students' Java
Programs,' Proceedings of the 6th Australian Computing Education Conference,
eds R Lister & AL Young, Dunedin, New Zealand, vol. 30, pp. 317-325.
Whalley, J. and Philpott, A. (2011), A unit testing approach to building novice
programmers skills and confidence. In Proc. Australasian Computing Education
Conference (ACE 2011),Perth, Australia, CRPIT, 114, John Hamer and Michael
de Raadt Eds., ACS. 113-118.