effects of clean code on understandability

Effects of Clean Code onUnderstandabilityAn Experiment and Analysis

Henning Grimeland KollerMaster’s Thesis Spring 2016

Abstract

Having understandable and readable code is important in the softwaredevelopment world as small changes can be made difficult and complex bypoor code. Clean Code is a set of rules and principles meant to keep codeunderstandable and maintainable, proposed by software engineer RobertC. Martin.

This thesis studied the effect of Clean Code on understandability. Afterconsulting previous research on refactoring and the effect on understand-ability, we identified a need for a measurement other than calculated codequality metrics. An experiment was conducted where the time used to solvecoding tasks was used as a measurement for understandability. For the ex-periment a system with characteristics of legacy code was refactored to bemore "clean". Two groups of participants were asked to solve three smallcoding tasks on this system. An experimental group solved assignments onthe "clean" code, and a control group on the legacy code.

We expected the Clean Code version to be more understandable, and thatthe experimental group would spend less time solving the tasks. Contraryto our expectations, two of the three assignments were solved faster bythe control group. These results indicate that Clean Code does not havean immediate effect on understandability. However, we observed thatdevelopers working with Clean Code wrote higher quality code.

i

Acknowledgements

I proudly present my master thesis on the effect of Clean Code onunderstandability. It was written at the Department of Informatics,University of Oslo during the years 2015/2016. After two years of work Iwould like to thank the persons who made it all possible.

Firstly, I would like to thank my supervisors, Sigmund Marius Nilssenand Yngve Lindsjørn for their great help and advise. Especially Sigmundfor his thorough and constructive feedback from start to end. Secondly,all of my family for their encouragement and kind words, in particular, mybrother Martin for his great interest and our long discussions. Also, thankyou Tom Erik for reading my thesis and giving me valuable feedback.

To all of my good friends, thank you for five incredibly fun andmemorable years at Assembler. And last but not least my girlfriend Vilde,thank you for all your love and support.

iii

Contents

1 Introduction 11.1 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 52.1 Understandability . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Clean Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Clean Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Code Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.5 The Effect of Clean Code on Code Quality . . . . . . . . . . . . . 12

2.5.1 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5.2 Unit Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5.3 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5.4 Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5.5 Unit Interface Size . . . . . . . . . . . . . . . . . . . . . . 132.5.6 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6.1 The Effect of Refactoring on Code Quality . . . . . . . . 142.6.2 The Effect of Refactoring on Productivity . . . . . . . . 152.6.3 The Effect of Refactoring on Understandability . . . . . 16

3 Research Method 193.1 Experimental Research Design . . . . . . . . . . . . . . . . . . . 19

3.1.1 Randomized Two-Group Design . . . . . . . . . . . . . . 213.1.2 Dealing With Error Variance . . . . . . . . . . . . . . . . 21

3.2 Implementation of the Experiment . . . . . . . . . . . . . . . . 223.3 The Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.1 Assignment 1: Implementing New Significant Func-tionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.2 Assignment 2: Change Current Functionality . . . . . . 243.3.3 Assignment 3: Bug fixing . . . . . . . . . . . . . . . . . . 24

3.4 The Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.5 First Run of the Experiment . . . . . . . . . . . . . . . . . . . . . 26

3.5.1 Assignment 1 . . . . . . . . . . . . . . . . . . . . . . . . . 263.5.2 Assignment 2 . . . . . . . . . . . . . . . . . . . . . . . . . 26

v

3.5.3 Assignment 3 . . . . . . . . . . . . . . . . . . . . . . . . . 273.5.4 Learnings From the First Run . . . . . . . . . . . . . . . 27

4 System and Refactoring 294.1 The System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1.1 Legacy Code . . . . . . . . . . . . . . . . . . . . . . . . . . 304.1.2 What is Legacy in the System? . . . . . . . . . . . . . . . 314.1.3 Code Smells . . . . . . . . . . . . . . . . . . . . . . . . . . 314.1.4 The different code smells in the system . . . . . . . . . 324.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.2.1 Extracting Repeated Code . . . . . . . . . . . . . . . . . 344.2.2 Better Names and Abstraction . . . . . . . . . . . . . . . 354.2.3 Reducing Complexity . . . . . . . . . . . . . . . . . . . . 374.2.4 Error Handling . . . . . . . . . . . . . . . . . . . . . . . . 374.2.5 Responsibility . . . . . . . . . . . . . . . . . . . . . . . . . 384.2.6 Unit Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Results 415.1 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.2 First Run of the Experiment . . . . . . . . . . . . . . . . . . . . . 42

5.2.1 Assignment 1 . . . . . . . . . . . . . . . . . . . . . . . . . 425.2.2 Assignment 2 . . . . . . . . . . . . . . . . . . . . . . . . . 425.2.3 Assignment 3 . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.3 Second Run of the Experiment . . . . . . . . . . . . . . . . . . . 435.3.1 Assignment 1 . . . . . . . . . . . . . . . . . . . . . . . . . 435.3.2 Assignment 2 . . . . . . . . . . . . . . . . . . . . . . . . . 435.3.3 Assignment 3 . . . . . . . . . . . . . . . . . . . . . . . . . 44

6 Discussion 516.1 Assignment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.1.1 First Run of the Experiment . . . . . . . . . . . . . . . . 516.1.2 Second Run of the Experiment . . . . . . . . . . . . . . . 526.1.3 Discussing Assignment 1 . . . . . . . . . . . . . . . . . . 52

6.2 Assignment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.2.1 First Run of the Experiment . . . . . . . . . . . . . . . . 546.2.2 Second Run of the Experiment . . . . . . . . . . . . . . . 546.2.3 Discussing Assignment 2 . . . . . . . . . . . . . . . . . . 54

6.3 Assignment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.3.1 First Run of the Experiment . . . . . . . . . . . . . . . . 566.3.2 Second Run of the Experiment . . . . . . . . . . . . . . . 566.3.3 Discussing Assignment 3 . . . . . . . . . . . . . . . . . . 56

6.4 Implications For Practice . . . . . . . . . . . . . . . . . . . . . . 586.5 Comparing our Result to Previous Study . . . . . . . . . . . . . 596.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.6.1 Quantitative Problem . . . . . . . . . . . . . . . . . . . . 606.6.2 The Participants . . . . . . . . . . . . . . . . . . . . . . . 606.6.3 Is Assignment 2 Fair? . . . . . . . . . . . . . . . . . . . . 60

vi

6.6.4 Not a Real Working Environment . . . . . . . . . . . . . 61

7 Conclusion 637.1 Answering the Research Questions . . . . . . . . . . . . . . . . . 637.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

A Original Code 67A.1 Create User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67A.2 Update User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69A.3 Get User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70A.4 Get a User By UUID . . . . . . . . . . . . . . . . . . . . . . . . . . 71A.5 Authorize User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71A.6 Export Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

B Refactored Code 75B.1 Create User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75B.2 Update User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76B.3 Get User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76B.4 Get a User By UUID . . . . . . . . . . . . . . . . . . . . . . . . . . 77B.5 Authorize User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77B.6 Find User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78B.7 Mixed Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 78B.8 Validate User Fields After Creation . . . . . . . . . . . . . . . . 79B.9 Validate User Fields After Update . . . . . . . . . . . . . . . . . 80B.10 User Pager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80B.11 Users Exporting Own Data . . . . . . . . . . . . . . . . . . . . . 81B.12 CSV-Exporter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82B.13 Test UpdatePassword . . . . . . . . . . . . . . . . . . . . . . . . . 82B.14 Test getUserWithUsername . . . . . . . . . . . . . . . . . . . . . 83

vii

List of Figures

5.1 First Run of the Experiment: Table Legend . . . . . . . . . . . 455.2 First Run of the Experiment: Group A Data . . . . . . . . . . . 455.3 First Run of the Experiment: Group B Data . . . . . . . . . . . 455.4 First Run of the Experiment: Time used by participants on

assignment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.5 First Run of the Experiment: Time used by participants on

assignment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.6 First Run of the Experiment: Time used by participants on

assignment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.7 First Run of the Experiment: Average time pr assignment . . 475.8 Second Run of the Experiment: Table Legend . . . . . . . . . . 485.9 Second Run of the Experiment: Group A Data . . . . . . . . . . 485.10 Second Run of the Experiment: Group B Data . . . . . . . . . . 485.11 Second Run of the Experiment: Time used by participants

on assignment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.12 Second Run of the Experiment: Time used by participants

on assignment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.13 Second Run of the Experiment: Time used by participants

on assignment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.14 Second Run of the Experiment: Average time pr assignment . 50

ix

Listings

3.1 Inserted Bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24A.1 Original: Create User . . . . . . . . . . . . . . . . . . . . . . . . . 67A.2 Original: Update User . . . . . . . . . . . . . . . . . . . . . . . . 69A.3 Original: Get User . . . . . . . . . . . . . . . . . . . . . . . . . . . 70A.4 Original: Get a User By UUID . . . . . . . . . . . . . . . . . . . . 71A.5 Original: Authorize User . . . . . . . . . . . . . . . . . . . . . . . 71A.6 Original: Export Users . . . . . . . . . . . . . . . . . . . . . . . . 72B.1 Refactored: Create User . . . . . . . . . . . . . . . . . . . . . . . 75B.2 Refactored: Update User . . . . . . . . . . . . . . . . . . . . . . . 76B.3 Refactored: Get User . . . . . . . . . . . . . . . . . . . . . . . . . 76B.4 Refactored: Get a User By UUID . . . . . . . . . . . . . . . . . . 77B.5 Refactored: Authorize User . . . . . . . . . . . . . . . . . . . . . 77B.6 Refactored: Find User . . . . . . . . . . . . . . . . . . . . . . . . 78B.7 Refactored: Mixed Validation . . . . . . . . . . . . . . . . . . . . 78B.8 Refactored: Validate User Fields After Creation . . . . . . . . . 79B.9 Refactored: Validate User Fields After Update . . . . . . . . . . 80B.10 Refactored: User Pager . . . . . . . . . . . . . . . . . . . . . . . . 80B.11 Refactored: Users Exporting Own Data . . . . . . . . . . . . . . 81B.12 Refactored: CSVExporter . . . . . . . . . . . . . . . . . . . . . . 82B.13 Refactored: Test UpdatePassword . . . . . . . . . . . . . . . . . 82B.14 Test getUserWithUsername . . . . . . . . . . . . . . . . . . . . . 83

xi

Chapter 1

Introduction

Imagine a desk. The desk is messy; there are papers floating around and nostructure in the documents whatsoever. Finishing the work started on thisdesk could prove to be difficult. To do so one would have to go through themess, and try to make sense of it before even starting on the actual work.This could take hours, while the actual work only takes a couple of minutes.

This is a very common scenario in the world of a software developer.Short, easy tasks are made complicated by untidy and "dirty" code.The developer has to spend time understanding this code before he canconfidently make changes. Or even worse, if he is in a hurry he may betempted to create a piece of code that further increases the mess.

Back to the earlier example. What if the desk was uncluttered and neat,with papers organized and stored systematically in folders? This wouldmake the assignment much easier. Now one could quickly pick up on theformer worker’s trail of thought and understand what needs to be done.The same goes for software development. Writing structured and tidy codemakes it much easier for both the writer and the next developer to workwith the code later on. But writing good and structured code is not thateasy.

Robert C. Martin talks about "code-sense" in his book Clean Code[17,p,7]. Code-sense is not only the ability to look at a piece of code andimmediately tell if it is good or bad, it is also the ability to see what canbe done to make the code good. According to Martin, some are born withit, while others must work hard to achieve it[17, p,7]. But even with goodcode-sense we might end up writing bad code.

One of the largest threats to good code is time constraints. Whenever aprogrammer is in a rush he might be tempted to write bad code just to finishon time. "It is just a small piece of code, I can clean this up later". But he just"broke a window" in the project: Andrew Hunt and David Thomas discuss

1

the "Broken Window Theory" in The Pragmatic Programmer[12, p,5]. Ahouse with broken windows gives the impression that nobody cares aboutit. So others also stop caring. Sooner or later the house will be completelydestroyed. The poor piece of code the programmer created is such a brokenwindow. It can create the feeling that nobody cares about the code, and thenext programmer working with it might not bother either.

Martin Fowler tells a story of another danger to good code in the prefaceof his book Refactoring: Improving the Design of Existing Code[11]. Eventhough programmers know about problematic code in their project, themanagement does not allow them to spend time on cleaning it up becauseit will not add to the features of the system. The customer cannot see thecode, so it does not matter if it is troublesome. This way of thinking is whybad code is often tolerated; it may look like it is more profitable to spendtime on adding new features instead of halting production and cleaning up.But in the long run, as in Fowler’s story, the poor code can bring the wholeproject to its knee due to how difficult it is to work with. Several practiceshave been proposed to cope with these challenges. Clean Code by Robert C.Martin is arguably one of the most renowned.

1.1 Research Question

There has been conducted studies about how to measure or evaluate codequality[4, 5]. These studies establish key-points in code quality and a set ofmetrics used to measure the quality of software. They also discuss howto use these metrics to do measurements. What they do not discuss inmuch detail is how to achieve a higher code quality. To find out what mightincrease quality, we read studies on refactoring and its effect on softwarequality[2, 9, 14, 19, 21, 22].These studies conclude differently about theeffects of refactoring on the quality of the code.

After reading these studies we wanted to focus more on understandabil-ity. We read studies about software understandability and how to measureit[8, 15, 16, 20]. These studies propose different models and methods formeasuring and evaluating the understandability of software, and of coursewhat software understandability really is. But when we tried to find studieson how to improve software understandability there was a lack of studies.So we decided to study Clean Code and its effect on understandability.

We wanted to study the effect of Clean Code because it is a widelyaccepted method of writing good maintainable code. Since we could notfind any studies on the actual effect of Clean Code we decided to use CleanCode as our guide to write understandable code, and see if it does improvethe understandability of code.

2

All of the previously mentioned studies use calculation to determine if re-factoring has a positive effect on code quality or productivity. Even thoughthese studies include understandability, our opinion is that understandab-ility cannot be measured the same way as code quality. We agree that alow complexity and few lines of code could indicate that the code is under-standable, but it might not be the case. The understandability of code iswhat enables developers to comprehend its meaning.

Therefore, we decided to use time as a measurement of understandab-ility. The reason is that we want to find empirical evidence by observingparticipants understand and solve small coding tasks. The argument wewant to make is that understandable code requires less time to become fa-miliar with, and coding can begin earlier. Hence, less time is used solvingthe assignments. Clean Code also comprises rules and principles that areless calculable: how to make code descriptive and informative to the pro-grammer.

We focus on small coding tasks as it is easier to determine theunderstandability of the code with smaller tasks. The reason being that withgreater coding tasks more time is used planning and implementing as oneare making great changes in the system. In small tasks on the other hand,time is mostly spent on understanding the system and the surroundingparts dependent on the change, instead of implementing.

• This leads to our main research question:Will Clean Code improve the time used to solve small coding tasks?

To determine whether small coding tasks are solved faster on clean code,we first need to narrow our question to specific coding tasks. We decidedto focus on three of the most commonly tasks performed by developers:Implementing new significant functionality, changing current functionalityand bug fixing. This gives us three sub-research questions:

• Sub-research question 1:Will Clean Code improve the time used to implement new significantfunctionality?

• Sub-research question 2:Will Clean Code improve the time used to change current function-ality?

• Sub-research question 3:Will Clean Code improve the time used to find and solve a bug?

Only after answering the sub-research questions do we have enoughknowledge to answer our main research question.

To answer our research questions we conducted a quantitative, empiricalexperiment. Volunteer participants were placed in two groups and asked to

3

solve three different small assignments on the system. One group workedon the original version of the system, while the other worked on a refactoredversion following the principles of Clean Code. The assignments aredesigned to answer our sub-research questions, and focus on one questioneach. Our research method is discussed more in detail in chapter 3.

1.2 Outline

Chapter 2 first gives an overview of understandability in the softwaredevelopment world, before it describes Clean Code and code quality. Itfinishes with discussing the effect of Clean Code on code quality and relatedwork.

Our research method is presented in chapter 3. Here we also discuss theimplementation of our experiment, the assignments and the participants.

Next we present the system used in the experiment and the refactoring.In section 4.1 we give an introduction to the system, and we identifydifferent problems with the code and justify why it is a fitting system touse. In section 4.2 we discuss the refactoring done to the system to make itmore clean.

In chapter 5, the results from the experiment are presented.

The results are discussed and analyzed in chapter 6. Limitations with theexperiment are also identified. After discussing the results we conclude ourresearch in chapter 7 and list our thoughts for future work.

4

Chapter 2

Background

2.1 Understandability

We want to examine the effects that Clean Code can have on understand-ability. However, we first need to establish what understandability meansin the field of software development. Boehm defines understandability thefollowing way: "Code possesses the characteristic understandability to theextent that its purpose is clear to the inspector[5]." He goes on to say thatthis implies that variable names are used consistently and modules are self-descriptive. This description has a lot in common with the ideas Robert C.Martin discusses in Clean Code. We discuss this more in detail in section2.2.

Understandability is part of a system’s maintainability[5] because adeveloper needs to understand the system in order to be able to maintainit. This makes understandability crucial when it comes to softwaredevelopment. If a system is not understandable, maintaining it will provedifficult for new developers. One could say that the original developers havenot properly communicated the purpose of the code if the system is hardto understand. If we think of it that way, source code can be a thoughtof as a form of communication between developers[20]. So, for a systemto be maintainable and evolvable, developers need to communicate witheach other through the code. As with any other kind of communication,code can be misunderstood or misinterpreted, and if a new developermisunderstands code there is a higher chance that errors are introduced.

Several metrics and models to measure understandability have beenproposed[15], [16]. These metrics focus on the size of the system, numberof operands and operators or the distance between usage of variables:

1. Code spatial complexity(CSC) is a measurement of the effort needed tounderstand the logic in a module. CSC is defined as the average distance, in

5

terms of lines of code, between the definition and use of modules[8]. Thehigher the number, the more cognitive effort is required to understand.

2. Data spatial complexity is defined as the average distance betweenthe usages of a variable, and is a measurement of the effort needed tounderstand the use of a variable[8]. As with CSC, a higher number indicatesmore cognitive effort needed to understand.

What can we do to make sure our code is understandable? Understand-ability is a very subjective matter which makes it hard to come up with agood answer. Robert C. Martin proposes his answers to this difficult ques-tion in his book Clean Code: A Handbook of Agile Software Craftsmanship.In section 2.2 we discuss his ideas and rules which, in his opinion, createbetter and more understandable code.

2.2 Clean Code

"Clean code is simple and direct. Clean code reads like a well-written prose."-Grady Booch[17, p,8]

In 2008 Robert C. Martin published the book Clean Code: A Handbookof Agile Software Craftsmanship[17]. It describes the principles, patterns,and practices of writing Clean Code. Clean Code in a nutshell is as GradyBooch describes it: Simple, elegant, direct and reads like a story/article[17,p,8].

"You know you are working on clean code when each routineyou read turns out to be pretty much what you expected."- Ward Cunningham[17, p,11]

Understandability is a major part of Clean Code. The code should clearlyshow what it is trying to do, and not hide anything from the reader. Thereshould be no surprises. Martin suggests several rules throughout the book,for how clean code should be written. He touches upon everything fromclasses to comments. Some of the most important rules mentioned in CleanCode are:

• Meaningful Names [17, p,17]Names are incredibly important when writing code. We nameeverything, and it helps describe what is going on in the code. A goodname states the intention of the variable, class or function. It shouldalso be clear from the names what the context of the code is. Thenames are what gives the code meaning.

6

• Functions should do one thing [17, p,35]A function should do only one thing. It should do what the namestates, and nothing more. If a function is doing multiple things itgets confusing very fast, and the potential for side effects increases.Side effects are dangerous and can lead to unwanted incidents.Every statement in the function should be nothing more than oneabstraction level below the name of the function.

• Functions should be small [17, p,34]Functions should be as small as possible. This way one is forced tofurther abstract the functions, which in turn strengthens the previousrule.

• Don’t repeat yourself [17, p,48]Never repeat code. This indicates a lack of abstraction. Withduplication bugs are more likely to be introduced. Whenever a changeis needed, the duplication forces the change to be implemented inmultiple areas. If an area is forgotten, it could lead to errors andbugs. Extracting the duplicated code to a single function removes thisproblem as the change is only needed in one place.

• Single Responsibility Principle [17, p,138]The Single Responsibility Principle, as the name implies, means thata class/module should only have one responsibility. One could alsosay that a class should only have one reason to change.

Martin discusses other rules, but we will focus on the ones in the listabove. Code following these rules is what we will refer to as being clean forthe rest of this thesis.

These rules are simple and yet one might not have thought about themearlier. They are all trying to make the code more understandable andeasier to work with. In "Toward a Measurement Framework for ExampleProgram Quality"[7] Börstler et al. compare two implementations ofa calendar program to investigate desirable properties of programmingexamples for students. They argue that "a good example must effectivelycommunicate the concept to be taught". This is similar to what Martin issaying about code clearly showing its intent[17, p, 55]. Börstler et al. namedone program the Beauty and the other the Beast. They find The Beautybeautiful for several reasons very similar to the rules we just listed:

1."The key concepts in the problem domain are modelled as separateclasses"[7]. Instead of having one large class, they have created severalclasses each with a single responsibility. The Month class has only oneresponsibility, and that is to know how many days there are in each month.The implementation of classes are following the Single ResponsibilityPrinciple.

7

2."The implementations are very simple and small"[7]. The functionsand the classes in The Beauty are small and they do only one thing. This issimilar to Martin’s rules about functions.

3."Carefully chosen identifier names enhance the readability"[7]. Thenames of the variables, functions and classes in The Beauty describe verywell why they are there and their functionality, very much like Martin’s ruleabout meaningful names.

4."The decomposition supports independent development and test-ing"[7]. When modeling all the key concepts as separate classes, each indi-vidual part can be tested more easily.

The Beast, on the other hand, is the complete opposite. It is mainly onelong function where all the logic is done. This long function is also highlynested, making it even more complex. As Börstler points out, this approachmakes it difficult to include identifiers to help with the comprehension.There are similarities between Börstlers reasons for why the Beauty isbeautiful, why The Beast is a beast and what Martin describes as "cleancode" and "bad code".

2.2.1 Clean Tests

Börstler speaks of the benefit of being able to test each of the classesseparately in The Beauty. Martin discusses the same, only he refers to theSingle Responsibility Principle. Testing is not something exclusive to CleanCode, but it is nevertheless an important topic. Unit tests are what makesthe code safe to change. "Test code is just as important as productioncode. It requires thought, design and care. It must be kept as clean asproduction code."[17, p, 124]

Unit tests should evolve with the system, and if the tests are dirty it willbe hard to keep them up to date with the system. To keep unit tests cleanthey need to be precise, simple and readable. Much like the code they aretesting. Many of the same rules applies to unit tests to achieve this, butMartin adds to these with additional rules.

Single Concept Rule

For a test to be precise it should focus on one thing. When a test containsmany assertions it starts to get confusing and difficult to see what the testis actually testing. The problem is not necessarily the amount of assertsbut the number of concepts being tested. With several concepts in the sametest, the reader has to use more effort to understand what each is testing[17,p,131]. Therefore Martin proposes the "Single Concept Rule": Minimize the

8

number of asserts and only one concept per test. The rule forces tests to bewritten as short and understandable as possible.

F.I.R.S.T

The acronym F.I.R.S.T is another five different rules clean tests shouldfollow[17, p,132]:

• FastTests should be fast. If a test is slow one might be reluctant to runit after a change. The faster a test runs, the more frequently it willbe used. One makes a change and run the test. Failing to have fasttests could lead to errors not being discovered at an early stage, andrepairing them might be more costly when they finally are discovered.

• IndependentA test should only depend on itself. It should be able to be runindependently and not rely on other tests being run for it to work.If tests are dependent on each other only one test needs to break tobreak the others. This creates a series of errors making debuggingharder.

• RepeatableA test should be able to run repeatably anywhere and whenever.It should not depend on the environment or any external services.This means that it should run without the Internet or outside thedevelopment environment. Without this, uncertainty is introducedas one cannot know if the test fails or passes because of other factorsthan wrong code.

• Self-ValidatingFor a test to be self-validating it needs to have a boolean output, passor fail. If a test is not self-validating it is hard to make it either of theabove. If one has to compare two files to see if the test passes, it is notvery fast or independent.

• TimelyA test should be written before the code it is supposed to test. Thisway the code must be designed in a way that makes it testable.Otherwise one might find the code hard to test because it is notdesigned to be tested, and decide to not test it at all.

The most important concept with tests, as Martin stresses several times,is that they provide safety when changing the code[17, p,124]. If the testsare dirty and hard to change, safety will be lost as the tests cannot evolvewith the system. Both F.I.R.S.T and the Single Concept Rule aim to helpwith this problem.

9

2.3 Refactoring

As a system grows and ages it might need to be cleaned up. This cleaningis what we call refactoring. Martin Fowler defines refactoring the followingway:

"Refactoring is the process of changing a software system in such a waythat it does not alter the external behavior of the code yet improves itsinternal structure"[11, p, xvi].

In his book Refactoring: Improving the Design of Existing Code[11],Fowler presents several techniques to improve the internal structure ofsoftware systems. These techniques can be used to reduce redundancy,divide responsibility to reduce unit size or to make the code more readable.

What is important to note from Fowler’s definition is that the externalbehavior should not be altered. In other words, refactoring does not givevalue to the customer immediately. Studies do however show that it canlead to improved code quality[9, 19, 22], which in turn can improve thedevelopment process and allow faster releases.

2.4 Code Quality

ISO/IEC 25010[13] is the international standard for evaluation of softwarequality. The standard defines the quality of a system in the following way:"The quality of a system is the degree to which the system satisfies thestated and implied needs of its various stakeholders, and thus providesvalue"."Needs" refers to performance, maintainability, security, etc[13].

So far we have discussed several principles and techniques that are sup-posed to make code more understandable and readable. Understandabilityand readability are both very subjective, which makes it hard to determineif implemented changes for understandability increases code quality. Thereare other, more objective, metrics which give an understanding of aspectsof code quality other than understandability

We will focus on the metrics used to determine software maintainab-ility[4]: Maintainability represents the degree of effectiveness and effi-ciency with which a product or system can be modified to improve it, cor-rect it or adapt it to changes in environment, and in requirements[13].

Baggen et al. supports this definition by claiming that: "when a change isneeded in software, the quality of the software has an impact on how easyit is[4]":

10

1) To determine where and how that change can be performed.2) To implement the change.3) To avoid unexpected effects of the change.4) To validate the changes performed.

The Software Improvement Group uses 6 different properties to calculatethe maintainability quality in code[4]:

• VolumeAs the system grows it gets less maintainable due to the size andamount of information in the system. A possible metric is lines ofcode.

• RedundancyIt takes more time to make a change in redundant code than non-redundant code. A possible metric is percentage of duplicated code.

• Unit sizeThe smaller the size of the lowest-level units the easier it is tounderstand them. Lines of code per unit can be used as a metric.

• ComplexityA system with high complexity is hard to maintain. Cyclomaticcomplexity is a good metric.

• Unit interface sizeA unit with many parameters indicate bad encapsulation. Number ofparameters per unit can be used as a metric.

• CouplingTightly coupled components introduces difficulty when change isneeded. Number of calls incoming per component can be used as ametric.

These scores are important for developers because they give an indica-tion of the quality of their software, and if it possesses desirable charac-teristics. Developers can use them to spot problems in the code before itgrows too large and solving the problem becomes too expensive or difficultto handle. A problem that could occur with this approach is that undesir-able characteristics might be discovered late in the development process.

To calculate these metrics accurately the development needs to be ata certain level of maturity, and at this point rewriting or refactoring isnot necessarily something one would want to spend time on. The worstcase would be discovering that the system needs a full rewrite due to highredundancy or coupling scores. To avoid this problem guidelines shouldbe followed when writing code. Clean Code offers a guideline to writeunderstandable code, but also code with increased quality as we will discussin the following section.

11

2.5 The Effect of Clean Code on Code Quality

Clean Code is a book about the craft of programming and how one canbecome a better programmer, and if we look closer we can see that Martin’srules not only aim to make code more understandable but they also increasethe quality. This means that by writing clean code the quality of thesoftware should increase. Let us look closer at some of the properties andhow Martin’s rules help to achieve a good quality score:

2.5.1 Redundancy

Don’t repeat yourself is one of the rules we discussed in section 2.2, andfollowing it will lead to code with less redundancy. This rule addresses theproperty redundancy directly. One could say that the property measureshow good of a job one has done following Martin’s rule about duplication.

2.5.2 Unit Size

There is a clear correlation between the property unit size and Martin’s rulefunctions should be small. The more the rule is followed, the smaller theaverage function size. Martin even argues that a function should be nolonger than a couple of lines[17, p, 34].

2.5.3 Complexity

When measuring complexity, cyclomatic complexity is a very common met-ric.[4]. Cyclomatic complexity was developed by Thomas J. McCabe[18]and it is a quantitative measure of the number of linear independent pathsthrough a control flow graph of a program. To calculate a function’s cyclo-matic complexity we use the following formula:

M = E −N +2

E is the number of edges in the graph and N is the number of nodes.Nodes translate to decision points, such as if statements. If statements withmultiple predicates count as one for each predicate. McCabe also showsthat cyclomatic complexity in programs with one entry point and one exitpoint is equal to the number of decision points plus one[18]. For functionswith multiple exit points it can be calculated with the following formula:

M = D −S +2

D is the number of decision points and S is number of exit points.

12

Another advantage of calculating the cyclomatic complexity of a functionis that it gives the necessary test cases to achieve full test coverage. Thescore gives the upper-bound to achieve full branch-coverage, and the lower-bound to achieve full path-coverage of the function.

Martin does not discuss how to avoid a high cyclomatic complexity, butas we shall see, following his rules leads to code with better complexityspread. Most important are the rules functions should do one thing andfunctions should be small. If a function has a high cyclomatic complexityit may be a sign that it is trying to do more than one thing, its size alsonaturally grows as a side effect. So by having a high cyclomatic complexity,two of Martin’s rules are violated and the quality of the code is reduced.But if the decision points are extracted into separate smaller functions thecomplexity will spread across several functions, each with a low complexity.The overall complexity of the system is not reduced, but dealing withsmaller less complex functions is better than dealing with large highlycomplex functions. This way Martin’s rules are followed and the qualityof the software is increased.

Martin’s rule meaningful names is aimed at giving functions meaningfuland descriptive names. If a function has many decision points it may provedifficult to describe all this in a name.

2.5.4 Volume

Clean Code does not help reduce the volume of a system, but if the rulesare followed the size will not be as much of a threat to maintainability.The volume of the system might grow due to Clean Code, but the ruleswe discussed in section 2.2 are rules meant to help write code that isunderstandable and easy to work with. So while the volume increases themaintainability also increases.

2.5.5 Unit Interface Size

Martin discourages functions with more than two parameters[17, p, 40]. Heprefers no parameters, and following this gives a good score on the propertyunit interface size.

2.5.6 Coupling

To achieve decoupled code, Martin talks about the Dependency InversionPrinciple(DIP): "In essence the DIP says that our classes should dependupon abstractions, not on concrete details"[17, p, 150]. What this means is

13

that high-level modules should not depend directly on low-level modules,there should be a level of abstraction in between that the high-levelmodules depend on instead. The low-level modules are then based onthis abstraction layer. This way the modules are isolated from each other,and changes in the low-level module wont affect the high-level module.We achieve less coupling and a more flexible and testable system whenfollowing the DPI.

Now we have drawn similarities between Clean Code, which can beviewed as a subjective good way to write code, and the objective metrics formaintainability quality. This suggests that Clean Code is not just subjectivebut it also contributes to increase the objective quality.

The four points made by Baggen et al.[4] can also be compared to CleanCode. We can summarize the points into one sentence: software with highquality is easy to work with and should not be surprising. This is howMartin describes Clean Code.

2.6 Related Work

Now that we have established the necessary background knowledge, wecan present and discuss some of the available related work. In particular,we want to look at research into understandability and the effect ofrefactoring, as this is closely related to what we have researched. Researchon refactoring cover several different areas, such as how to refactor[11, 17],the effect on code quality, productivity etc. We did refactoring to increaseunderstandability and quality, but we strictly followed Clean Code whendoing so. Therefore it is very interesting to see what other found andconcluded, when they have studied the effect of refactoring.

2.6.1 The Effect of Refactoring on Code Quality

There have been several studies on refactoring and its effect on code quality,but the results differ. Some conclude that refactoring improves codequality[9, 19, 22], while others conclude the opposite[2, 14, 21]. Despitediffering results, all of the studies are carried out in a very similar manner.

First, they establish the refactoring techniques to be used. Thetechniques proposed by Martin Fowler in his book "Refactoring: Improvingthe Design of Existing Code"[11] is mainly used. For example: "ExtractMethod"[11, p,110], "Extract Superclass"[11, p,336], "Move Method"[11,p,142], "Replace Data Value with Object"[11, p,175].

14

Secondly, they calculate the quality score of one or more systems beforerefactoring and then calculating again. We discussed some of the metricsused to give the systems a score in section 2.4. Finally, the two scores arecompared.

Why these very similar studies come to different conclusions is difficultto say. The original state of the systems they refactor could be a factor, orhow skilfully the refactoring is done. But it shows that refactoring mightnot always increase code quality.

A common problem with these studies is the quantitative side. Theyrefactor few systems, sometimes just one, and decide whether refactoringbenefited code quality. The conclusions are also based solely on thecalculated quality score. We argued in section 1.1 that time could be a bettermetric for understandability as it is difficult to measure how descriptive andinformative code is for a developer.

As we mentioned, the refactoring techniques used by the researchpresented in this section is based on the techniques of Martin Fowler. Thesetechniques resemble the techniques we discussed in section 2.2, such as the"Single responsibility principle" and "Don’t repeat yourself". One differencethough, is that Fowler’s techniques is meant to improve the design ofexisting code. On the other hand, Clean Code describes how one shouldwrite code to attain understandable, maintainable and readable code. Sothe approach is different, but the results should be much the same. Insection 2.5 we discussed how Clean Code could help increase code quality.The results from these studies suggest that our reasoning is correct, but asthe results vary we cannot conclude anything.

2.6.2 The Effect of Refactoring on Productivity

Moser et al.[19] studied the effect of refactoring on productivity and quality.Their approach was very different from the one we described in section2.6.1. By conducting a case study in a close-to-industrial environment, theyfound that refactoring increases quality and productivity.

Moser et al. follows a newly started agile development process over eightweeks, divided into five iterations lasting from one to two weeks each. Aspart of the development process the team had to solve two user storiesfocused on refactoring. The first story during the second iteration and theother during the fourth. This way they could compare productivity andquality between iterations. Productivity was measured as the number oflines of code added during each iteration. Moser et al. observed that theaverage productivity was higher in the iterations after refactoring than theother iterations.

15

The fact that this study is a case study in a close-to-industrial environ-ment makes their findings extra special. Observing the effect of refactoringon a software project over a longer period gives a completely different typeof data than the studies we discussed in section 2.6.1. From the data pro-duced one could also tell if refactoring helps with maintaining the qualityof software. Moser et al. suggests that "refactoring of a software systemprevents in a long term its deterioration".

There are some problems with Moser et al.’s research, which makes itdifficult for us to see how their study proves that understandability also isincreased by refactoring. Firstly, they focused on productivity with linesof code added as measurement in a small newly started project. Eventhough many lines of code are added it does not necessarily mean that thecode is more understandable after refactoring. As the project is new, highproductivity is expected due to the project taking shape. Secondly, the sameteam of developers participated throughout the case study. An increase inproductivity could then also mean that the developers are becoming moreaccustomed to the source code.

2.6.3 The Effect of Refactoring on Understandability

When we started working on this thesis we could not find any research thatstudied the effect of refactoring or Clean Code on understandability in anyother way than calculating the code quality score. This led us to use time asmeasurement for our research. After we started our work however, Erik etal.[3] published a study on the effect of refactoring on understandability inwhich time was used as measurement. Unfortunately this research did notcome to our attention until the later stages of our work. If it had come toour attention earlier, we could have built on their research.

The study performed refactoring on a system with characteristics oflegacy code. They used refactoring techniques from both Martin Fowler[11]and Robert C. Martin[17], and categorized the refactorings based on thenumber of classed they added. They used three categories: small, mediumand large. If no classes were added it would go into the small category, oneclass added would qualify as medium and finally two or more classes wouldbe categorized as large. Then they conducted five different experimentswhere an experimental group solved assignments on the refactored codeand a control group on the original code. For each experiment they createdan assignment in the form of either bug fixing or changing functionality.The average time the groups used on an assignment were then compared todetermine the effect of the refactoring.

The research is similar to ours in many ways, but on a larger scale. Theirrefactorings had larger changes to the system, as they extracted code andcreated several new classes. In their most substantial refactoring they split

16

a class with 350 lines of code into multiple new classes. Furthermore theyconducted several experiments and had more participants. Even so, as wediscuss in section 6.5, our study supports their findings.

17

Chapter 3

Research Method

This chapter presents our research method. We discuss experimentalresearch design, how we implemented our experiment and present theparticipants. After the first run of the experiment we found reason to makesome adjustments to the implementation of the experiment for the secondrun. To make it clear why we made these changes to our method we presentthe problems we experienced in section 3.5. In section 3.5.4 we discusswhat we learned and what actions we decided to take.

To answer our research questions we acquired a system that we foundcontaining code-smells[11, 17], plus violating many of the different rulesand principles described in Clean Code. We refactored a class in thissystem, removed the code-smells and made it more clean. This system andthe refactoring we did is presented in chapter 4. Then we conducted anexperiment.

The expected result of this experiment is that the refactored version ofthe system is more understandable. Therefore participants should requireless time to solve the assignments.

3.1 Experimental Research Design

Experimental research is used to understand causal relationship by manip-ulating one or more variables, while controlling and measuring any changein other variables[6, p,290]. The variable manipulated is the independ-ent variable, and its values are set by the experimenter. It is a variableindependent of the participant’s behavior. To manipulate and independentvariable, it needs to have at least two levels that participants can be exposedto. When manipulating the independent variable, the goal is that it causeschange in the recorded variable. This is the dependent variable: a variabledependent on the behavior of the participants. If there is a causal relation-

19

ship, the value of the dependent variable will to some extent depend on thelevel of the independent value.

The most basic experimental design is exposing one group of participantsto a level of the independent variable, while not exposing another group[6,p,109]. The group exposed is called the experimental group, and the othergroup the control group. The control group’s behavior is the baseline,against which the experimental group’s behavior is compared.

The goal of our research is to see if Clean Code has any effecton understandability. Clean Code is the independent variable, andunderstandability the dependent variable. As we are able to manipulatethe independent variable, and want to measure and observe the dependentvariable, an experimental research design is suitable[6, p,109]. To measurethe effect of the independent variable on the dependent variable, wemeasure the time the participants spend on solving coding tasks. If CleanCode does increase the understandability the time participants use shoulddecrease.

Experimental designs can be categorized into three basic types[6, p,291]:

• Between-subjects design

Participants are randomly assigned to the different levels of theindependent variable. In other words, they are part of the controlgroup or the experimental group, but never both. The data areaveraged across the subjects.

• Within-subjects design

Participants are exposed to all levels of the independent variable.They are part of both the experimental group and control group. Thedata are averaged across the subjects.

• Single-subject design

Participants are exposed to all levels of the independent variable.Instead of focusing on changes in groups, the focus is on singlesubjects or a small number of subjects.

In this study the independent variable only has two levels, whetherthe code is written following the Clean Code principles or not. Thismeans we only need one experimental group and one control group. Thesingle-subject design focus on the effect the independent variable has onindividuals, whereas we are interested in the general effect of Clean Code.Therefore, between-subjects or within-subjects designs are more suitable.

Were we to use the within-subjects design, we need to refactor at leasttwo different parts of the system, and create similar assignments for both.

20

Using only one part and one set of assignments would not work due to thecarryover effect[6, p,306]: After finishing the first test, the participantswould know all the answers and the second test would be meaningless.Refactoring two parts is time consuming and a job for future research.Therefore we found it better to use between-subjects design.

3.1.1 Randomized Two-Group Design

In this study we chose to use a randomized two-group design, as we foundthis to best fit our case due to its simple design.

When using the randomized two-group design, the first step is torandomly assign the participants into two groups. This randomizationhelps dealing with error variance such as individual differences. Thesegroups are then exposed to different levels of the independent variable.When conducting the experiment, other variables should be kept asconstant as possible between the groups. Finally, the means are calculatedfrom the results, and analyzed[6, p,294].

The design is very simple and it has some clear advantages[6, p,296]:

• It is simple to carry out.

• No need to pretest or categorize subjects.

• It requires few subjects.

• Naturally randomizes subjects across the groups.

One of the downsides with this design is that it provides little informationabout the effect of the independent variable. The result only tells ifthe variable had an effect. We can see if Clean Code had an effect onunderstandability, but not in what way. We cannot tell if Clean Codedecreased the time used because it makes code more maintainable, and soon. This out of scope for this study and something future research can studyin more detail.

3.1.2 Dealing With Error Variance

There are other variables that could have an effect on our dependentvariable. These variables create an error variance. Error variance couldbe a problem, as it affects our ability to determine the effectiveness of ourindependent variable[6, p,293]. The variables that are most threateningto this study are individual differences between participants and theimplementation of the experiment.

21

We are dealing with individual differences by assigning the participantsto groups at random. This way the individual differences should be evenlydistributed across groups.

The implementation of the experiment could introduce error varianceif it is not consistent. It is important that all runs of the experiment areas consistent as possible. All information given to participants should bestandardized as to not affect the variables

3.2 Implementation of the Experiment

The participants were randomly divided into two groups and asked to solvethree code assignments. One group was assigned to the original version,and the other group to the clean version of a system. This system and therefactoring we did to create the clean version is presented in chapter 4. Theparticipants were not aware that there was two different versions.

To record the time each participant spent on each assignment they weregiven a form to report the start and finishing time. They then calculated thetime spent on each assignment as they finished with all three. To controlthat the assignments were solved correctly the participants were instructedto raise their hand when they thought they had solved an assignment.The solution was then controlled and the participant told if the answerwas correct or not. If the solution was incorrect they continued with theassignment.

Other approaches we thought of were to write several unit tests that theyhad to pass for the assignments to be approved. We decided against thisbecause we felt that unit tests would point them too much in the rightdirection and perhaps spoil the results.

We did two separate runs of the experiment, to keep the groups smallso we could be more available during the experiment. We wanted to beavailable so we could control solutions as fast as possible. If participantswait for our control and this adds to their time it would contribute to errorvariance. After the first run of the experiment we made some changes tothe implementation for various reasons. We discuss this in more detail insection 3.5.4.

Unfortunately, the participants were not able to run the system whilethey were working on it. This is mostly because the system is not workingproperly since we have stubbed or completely removed many vital parts.We had to do this to be able to use this system for the experiment, as itis part of the agreement we have with the owner of the system. Anotherreason is that the system is an API, so just running it is not enough. One

22

would have to execute different requests to test it, and this is something wedid not want the participants to spend a great deal of time on.

3.3 The Assignments

For the experiment we designed three different assignments to answer oursub-research questions. Each assignment focuses on its own question. Wetried to make the assignments simulate work one would experience as adeveloper. Implementing new functionality, changing functionality andbug fixing are tasks a developer would perform during a normal day ofwork.

3.3.1 Assignment 1: Implementing New Significant Func-tionality

"Right now this user-API does not support updating a user. Your firstassignment is to implement this. Remember: The user must be authorized.A users username, email and uuid must all be unique."

The first assignment is to implement new functionality in the system,namely the updateUser function, see listing A.2 and B.2. We removed thefunction entirely from both versions along with the unit tests created forit in the refactored version. This is the largest assignment and the onewe expect the participants to use most of their time on. The assignmenttests the difference in how well the two code styles are understood whenworking with them for the first time, and how difficult it is to add newcode into them. The result will reflect the ability to reuse code and howunderstandable the system is overall. As this is the first assignment it isexpected that much of the time used on this assignment is used to learn thesystem and how it works.

The expected solution is the same in both versions of the code:

• The updated values in the user-object must be validated as legalvalues.

• If username or email has changed they must be validated as unique,for obvious reasons.

• The user doing the update must be authorized.

Updating a user is much like creating one, therefore looking at how auser is created can be of much help to the participants. They were not giventhis advice, but we expect them to see the similarities. Because of this the

23

assignment can be solved by copying much of the logic from createUser, seelisting A.1 and B.1. createUser and updateUser shares a lot of functions,so there is not very much coding needed, just the additional validating ofunique username and email. This is true for both the original version andthe refactored version. This makes the assignment consistent, as there isno clear advantage to either of the versions.

3.3.2 Assignment 2: Change Current Functionality

"Almost every function takes a parameter called "userId". Right now theAPI treats this as a UUID. Your assignment is to change the functionalityso all of these functions also try to fetch a user by username if it fails tofind a user by UUID. In other words, after this change "userId" could alsobe a username."

The second assignment aims to test much of the same as assignment 1,the understandability of the existing functionality in the two versions andhow easy it is to identify reusable code. But there is a major differencefrom assignment 1: This assignment also test how changeable the code is.Currently almost every function in the API requires the UUID of a user tofetch it from the database. The assignment is to modify the functions toalso be able to fetch users by username.

The expected solution is to write code that tries to find a user byusername if it fails to find one by UUID. This is a short assignment butthe amount of work needed differs between the versions. In the refactoredversion a function that does this already exists, see listing B.6, so theproblem could be solved by using it. In the original version, on the otherhand, one could write a similar function and use it, or change every functionthat fetches the user.

We expect the assignment to favor the refactored version, as it isprovided with a function that can be used in the solution. This is intended,as having functions such as this to reuse is one of the benefits of writingClean Code.

3.3.3 Assignment 3: Bug fixing

"There is a bug preventing users from being created with the "createUser"function. Your assignment is to find this bug."

The third and last assignment is about bug fixing. The first two focuseson working with the code and implementing new code into it, but now oneneeds to really understand what is going on in the code to reveal the bug.

24

Listing 3.1: Inserted Bug1 if (dao.getUserByUsername(newUser.getUsername(), false) == null) {2 logger. info(append("method", "createUser"),3 "Username is already in use!") ;4 throw new ValidationException(5 new Error("duplicate_unique_property_exists",6 "Unique check failed", "This username is already in use!")) ;7 }

The bug is not part of the original system in any way; we have insertedit into the code. It is a logical error inserted into the validation of theusername during creation. The error makes the system fail wheneversomeone tries to create a new user with a username that is not in use, but itwill succeed if the username already is in use. See listing 3.1.

The expected solution is to identify the bug and change the conditionalfrom equal to not equal. The simplest approach to finding the bug differsbetween the two versions. As the original code has all the logic of creatinga user inside the createUser function, all one has to do is to go throughthe function carefully step by step until it is discovered. The refactoredversion’s createUser has been split into several smaller functions doing allthe work, so one has to go through each of them carefully until the bug isdiscovered.

There is no help in this assignment from logs or any other kind oferror messages. The assignment only states that there is something wrongpreventing users from being created. Because of this we do not expect theassignment to clearly favor any of the two versions. It might be a little easierto spot the bug in the original version due to the fact that the bug is inserteddeeper into the code in the refactored version behind two function calls. Itis also possible for the participants working on the refactored version towrite unit tests to discover the logical error, instead of looking through thecode.

3.4 The Participants

We had a total of 18 people participating in our experiment. Most of themare graduate students, except four who are experienced developers. Thesefour volunteered when we asked for volunteers from a software company.We had hoped more would participate so we could get data from bothstudents and experienced developers. This would have been interestingbecause a student might not have developed a preference for a specific codestyle yet. While an experienced developer might mean that understandablecode is what he or she is used to work with. Had we had more developersit could have been possible to see if there were any noticeable difference inwhat students and experienced developers feel is understandable code.

25

Even though we had more students participating we still had fewer thanwe had hoped. Why it proved difficult to recruit participants is hard to say,but we noticed during the first run of the experiment we had to use graduatestudents due to a need of experience with larger systems. This greatlyreduced the number of people we could use. In addition, the graduatestudents were all working on their own thesis and they might have felt thatthey did not have time for someone else’s experiment.

3.5 First Run of the Experiment

During our first run of the experiment we experienced some problemsthat possibly created noise in the result. We deemed some of these issuesharmful enough that we had to make some changes to the implementationof our experiment for the next run. We discuss the changes made in moredetail in section 3.5.4. In the following subsections we go through eachassignment and discuss flaws we discovered and feedback we received fromthe participants. The results of the experiment are presented in chapter 5.

In the first run of the experiment we had eight students participating.These eight were randomly divided into two groups. All eight participantswere graduate students so the groups should be more or less even in skilland knowledge. What version the groups worked on were decided by a cointoss. The group that was assigned the original code was named group A,and the group assigned with the refactored code was named group B.

3.5.1 Assignment 1

As this is the first assignment most of the problems the participantsexperienced were things such as importing the system into an IDE, orfinding the correct class to work in. These things added noise to the resultsas many participants added the time used to import and finding the class tothe time they used to actually solve the assignment. This was an unforeseenproblem, and we decided to fix it for the next experiment.

Other than that the assignment worked as expected.

3.5.2 Assignment 2

For the first run, the assignment text was slightly different from the onepresented in section 3.3.2. The assignment text read: "Right now thefunctions in our user-API uses the parameter "userId" to find a user by

26

uuid, but userId could also be a username. Implement the functionality tosupport this."

There were few problems with this assignment, just some minormisunderstandings about the assignment text. Some participants did notunderstand what the assignment was asking them to do and needed someextra explaining. Because of this the text needed to be more explanatoryand go more into detail.

3.5.3 Assignment 3

Again, the assignment text was slightly different from the one presentedin section 3.3.3 for this run. The assignment text read: "There is a bug increateUser. Find it."

We had some problems with this assignment text. Many of theparticipants from group B told us that they misread the text to mean"inside" the function createUser. They were only looking at the functioncalls, and not further exploring the functions called. So they based theiranswers on the order of the calls, and not what the other functions calleddo.

They also looked for other functions throughout the system that theythought the assignment referred to. Group A did not have this problem,as all the logic was inside createUser. They did as expected and read thefunction line by line and eventually discovered the logical error.

Whether group B misread the assignment because of their inexperiencewith Clean Code, larger projects or because of the text itself is hard to say.Either way it provided valuable feedback on the assignment text.

3.5.4 Learnings From the First Run

After the experiment we had a short discussion with the participants aboutthe experiment. They felt the size of the project was problematic. Withso many classes and files it was confusing and difficult to know where tostart. The assignment text was deliberately written so as to not tell theparticipants where they should start. This was to create a more realisticsituation. But this may also have damaged the results as the times usedby participants to discover the problem area differed. The problem is thatparticipants located the correct class at random times, so we cannot knowif the time recorded represents locating the class or solving the assignment.A participant could spend a lot of time finding the correct class, but verylittle to solve the actual assignment. Or the other way around.

27

This creates questionable results as the time used to solve the assignmentdisappears in the time used to find the correct class. We are only interestedin the effects of the refactored code. Therefore we decided that theassignment texts should clearly state where they should be solved.

Because the size of the system proved to be challenging for this groupof graduate students, we decided that for the next run of the experimentwe should only use participants on master level and higher. This meantacquiring volunteers would be more difficult, but it assured us that they arecapable of solving the assignments.

The main points of what we learned after this run was that theassignment texts needed to be more precise and explanatory, and that weshould point to the class in which all the assignments are to be solved. As wediscuss in section 4.1 we only refactored this class in the system. Based onthe feedback from assignment 3, we changed the text to be more explicitby stating that there is something wrong with the process of creating auser. We hoped this alteration would make the participants go throughthe function calls and debug them as well.

When rewriting the assignments to be more precise and explanatory, itis important that they are not written too explanatory. We do not want toguide the participant to the solution step by step, but we want to point themin the right direction. We want the text to make them understand that whatthey need to solve the assignment is already in the class, or needs to becreated by them. Since we only refactored this one class, as discussed insection 4.1, we do not want participants to dig through other parts of thesystem. The new assignment texts are presented in section 3.3.2 and 3.3.3.

To control the solution of the participants, and decide if it passes ornot is also something we decided against doing after the first run. Thereason is that participants were given different hints and pointers whentheir solution did not pass. In other words, it was difficult to keep allvariables consistent and standardized. It was also difficult to avoid otherparticipants hearing other proposed solutions, leading them to the answer.To avoid this, we decided that the participants should themselves decidewhen the assignment is solved, and deliver their work when they finish.We then have to go through their solutions and control their answers. Iftheir answer is wrong, or missing important logic, we have to take thisinto consideration when comparing the results. The error rate representsadditional data we can use.

After making these changes to our experiment, we hoped that the sizeof the project would not be a as much of a problem, as the participantswould not have to look through it to find the correct class. Furthermorethe assignment texts should be clearer and not cause confusion or misun-derstandings. Also, we hoped to eliminate as much noise as possible in theresults, especially the time used to locate the problem areas.

28

Chapter 4

System and Refactoring

This chapter introduces the code base we used for our experiment, and whywe found it suitable for our experiment. We point to different problemswith the code in section 4.1, and in section 4.2 we discuss the refactoringdone to make the system more clean.

To be able to make the system clean we had to study Robert C. Martin’sbook Clean Code[17]. We studied code quality and compared Clean Codeto code quality and its different metrics to make sure that Clean Code alsoincreases the objective quality of software. This is presented and discussedin chapter 2. We performed several types of refactoring, from just renamingvariables and functions to extracting code to divide responsibility and toreduce duplication.

All of the functions we discuss in section 4.2 are presented in theiroriginal form in appendix A. The refactored versions are presented inappendix B.

4.1 The System

We have worked on a part of a larger system written in Java. It is a REST-API handling create, read, update and delete requests(CRUD-requests)from users of the system. Even though it is just one part of the entiresystem it is nonetheless a sizeable part. Apart from handling requests italso authorize users, queries the database, sends emails and so on. For itto be able to do all this, it is dependent on several different modules andlibraries.

Because of this we decided to focus on the API for managing users.This API is not dependent on too many modules and it is straight forwardin what it is doing. Typically it handles CRUD-requests and some other

29

requests, such as exporting users and getting the roles a user have in thesystem. It is a sizeable class with long functions doing more than onething. There is also much duplication of code and unnecessary complexitythroughout the class.

In addition, this class has some other problems. It resembles "LegacyCode"[10], it contains "Code Smells"[11, 17] and it breaks several of Martin’srules and principles in Clean Code. In the following sections we brieflydiscuss what Legacy Code and Code smells means, and then why, in ouropinion, this system suffers from these problems.

4.1.1 Legacy Code

The noun legacy is defined as: "something handed down by a prede-cessor"[1]. It is something old, but not necessarily bad. In the software de-velopment world, legacy can mean many different things, and it is mostlyused as a negative adjective. "Legacy code" and "legacy system" is some-thing most developers dread to hear.

Michael Feathers defines legacy code as "code without tests"[10, p,xvi].Code without tests is the worst case scenario, as we discussed in section2.2. Without tests one cannot control whether or not the code is changedfor better or worse, thus change gets difficult and time consuming.

In the book he describes typical characteristics of legacy code[10]:

• Classes and functions in a legacy system are large and complicated.

• Figuring out what to do and where to do it in a legacy system can bedifficult. Implementing the change is also a problem.

• Legacy code often has duplication. Whenever changes are introducedit has to be implemented in several places.

Legacy system is mostly used to describe outdated systems, written bya previous generation of developers. These systems can be difficult tochange, and hard to understand[10, p,77]. Mostly due to their tangledand unintelligible structure and outdated technology. Even though thesesystems are outdated they are often still widely in use. Legacy systems arealso highly cost inefficient, as making a small change is difficult and timeconsuming.

The term legacy code is used to describe code that has the samecharacteristics as legacy systems. It can be old code that is part of a systemthat is constantly updated and still evolving. Legacy code is that old hairypiece of code that no one dares to touch in fear of breaking something. This

30

old code can be a burden on the evolution of the system, and slow downdevelopment.

Even though we speak of legacy code as old code, it is not always thecase. If a company experiences that their product is popular and the usersare requesting a great deal of new features and fixes, the developers mayrush these out to keep their product popular and growing. This could leadto poorly written code with many of the same characteristics as legacy code."The Beast", as we discussed in section 2.2, is a good example of such legacycode, with its long and complex functions.

4.1.2 What is Legacy in the System?

If we try to identify the characteristics given by Fowler, it is easy to see thatthe class matches the first in his list. The functions in our class are long.They are all above 20 lines, some are close to 100 lines. In addition, thereare many if statements and multiple exit points.

One of the most notable reasons for all these if statements and exit pointsare duplication; number three on Fowler’s list. Every function executes thesame authorize-check. The check is not very complicated, but it introducesboth an if statement and an exception. Since this is the user-API thereis a lot of fetch operations for user-objects from the database, which arealso executed by many of the functions. Duplication is one of the mostsubstantial problems in the class.

The size and the duplication makes the class unnecessarily complicated.We expect this to make it more complicated to make changes, to debugand to find where to make the changes. If this proves to be true after theexperiment, it also matches Fowler’s second characteristic.

Furthermore, there are no unit test. By Fowler’s definition, the class islegacy code based just on this. Since the class matches two of the threecharacteristics, and there are no unit tests, we classify the class as legacycode.

4.1.3 Code Smells

Martin Fowler uses the term "Code Smells" in his book "Refactoring:Improving the Design of Existing Code"[11, p,75] to describe problems withcode that points to a need for refactoring. It is a sign that there is somethingwrong with the system, possibly a deeper problem. These problems are notbugs or something that breaks the system, but rather signs of bad designand implementation. Fowler lists several smells in his book[11, p,75],smells that indicate a need for refactoring:

31

• Duplication

• Long Classes

• Feature Envy

• Comments

In addition to Fowler’s definition of code smell, one could also say thatcode smells point to violations of principles and poor quality. Our definitionis a mix of the two, as we point to different violations of the principles andrules described in Clean Code, and some of Fowler’s smells, in our system.

4.1.4 The different code smells in the system

Functions Should Do One Thing

Martin states that functions should do one thing, and one thing only[17,p,35]. He is not saying that the function should only contain one line, butrather that the function should only operate on one level of abstraction.Specifically one below the functions name. Many of the functions in thisclass violate this by operating on several different levels of abstraction. Agood example is the createUser function, listing A.1. This function doesmore than just creating a user:

• First of all it validates the values given by the user. If one of theseproves to be invalid the function aborts by throwing an exception.

• The location of the user is located using the IP.

• After the user been created the function tries to log the user into thesystem.

Each of the points listed above operate on their own level of abstraction.Having them in the same function then obviously makes the functionoperate on more than one level of abstraction.

Descriptive Names

Most of the names used on variables and parameters are descriptiveenough. They state their purpose and functionality. The problem is thelack of descriptive function names. This is mostly due to the problem thateach function does all the logic by itself. There are no small helper functionswith names that can help with understanding what these large functions aredoing. This code smell warns about more code smells; the functions are not

32

doing one thing, they have more than one responsibility, and they containpossible side effects.

Misplaced Responsibility

This code smell goes hand in hand with "functions should do one thing".If a function is doing more than one thing, it is taking on too muchresponsibility.

Duplication

We mentioned in section 4.1.1 that the class suffers from duplication. Theduplicated code mainly performs authorization and fetching of user-objectsfrom the database. These are not long parts of code, but they introducenoise and complexity.

4.1.5 Summary

After identifying both characteristics of legacy code, and different codesmells we judge the chosen system to be suitable for our experiment. Itis suffering from some problems but these are not making the systemimpossible to work with. If the problems were too large the experimentwould not have the same credibility, as the assignments would not be fair.Comparing a very messy system, with a lot of problems, to a clean systemwould not prove much. Any kind of system with less problems would mostlikely prove to be easier to work with.

4.2 Refactoring

We discussed in section 4.1 that we decided to refactor one specific classof the system. The responsibility of this class is to receive and answerrequests. Therefore it is fitting to abstract the logic that creates the answerout of the interface code and into an internal service. This new service isresponsible for creating answers and executing processes for the REST-API.Fetching user-objects, saving user-objects, validation and authorizationetc, is all part of the responsibility of the new service. We now have twolayers of abstraction instead of one. Instead of one interface handling allby itself we have an interface handling the requests and a service layerbeneath, handling the logic and creating answers. Each layer has its ownresponsibility thereby fulfilling the single responsibility principle.

33

4.2.1 Extracting Repeated Code

Many of the requests that the API handles require the same steps to createan answer. One example is validating if the user has the correct authority.Another example is to retrieve information about a user from the database.Extracting this functionality into separate functions is beneficial for severalreasons. Most importantly, it greatly reduces duplication. If the way usersare authorized changes, the work needed to implement the change is verytime consuming and complex if the code is duplicated across the system.When extracted into a separate helper function this would only have to bechanged in one place. There are other benefits of extracting. The logic canbe explained in the form of function names. In addition, the error handlingis hidden. Instead of having a confusing if statement we have a function callwith a descriptive name.

Almost every function in the API does some work on a user-object.Therefore the functions try to fetch a user from the database. This littlepiece of code, listing A.4, was repeated in most of the functions. Afterrealizing this we decided to extract it into a separate function.

Code listing B.4 shows the result of the refactoring. The new functionretains the functionality: it does a null check and then throwing anexception if the user is not found. By extracting this logic, we get rid ofthe noisy null check in almost every function as well as the creation of theexception. We also have to modify the code only once if a change is needed.The function also brings more security into the system as it never returnsnull. We will discuss this further in section 4.2.4.

Duplication With Minor Differences

Sometimes when trying to extract repeated code one can come acrossrepeated code with minor differences. We experienced this when we triedto refactor the validation of user fields when a user is created and updated.They both validate the same fields, but the validation of an updated userhas some special cases. When we create a new user we always want to makesure that the username is not already in use, but when we update a user itis only a problem if the username has changed.

To eliminate all duplication, we tried to make a function that could serveboth purposes: Validate the username whenever a new user is created.When updating a user, we only validate the username if the usernamehas changed. The result was code listing B.7. This function takes twoparameters, a user and a boolean used to determine if the user is new.It then declares two booleans, one for username and one for email, andinitializes them to true. If the user is updated these booleans are setcorresponding to whether the values have changed or not. Then, if the

34

username has changed and there is a user with the same username wethrow an exception. We felt there were multiple problems with thissolution.

Most importantly this new function felt more complicated than neces-sary. During the refactoring we had to introduce more if statements, andmake the existing ones more complicated to maintain the original behavior.This made it harder to understand what the function was doing with a quicklook, especially since the function behaves differently whether or note theuser is created.

After considering this we decided to try something different. Thistime we extracted the shared code into separate functions and created avalidation function for both created users and updated users. This waythe functions read much easier and one can understand the functions aftera quick look. Even though we now have two fairly similar functions, seelisting B.8 and B.9, they are each less complex and more descriptive thanthe single function we first created.

What Can We Learn From This?

What we did in our second attempt was to value code with good under-standability, instead of low duplication. We lowered the duplication, butallowed some in order to achieve better understandability. The first at-tempt removed all duplication, but it came at the cost of understandability.There is a fine balance to what we did though. Much duplication in order toachieve understandability might not be worth it due to the high workloadneeded when change is introduced to the system. Low understandabilityto achieve low duplication has the same effect, implementing changes in asystem that is difficult to understand can be time consuming.

But could we have seen the problems earlier when we started refactoring?Did we break any of the rules in Clean Code? First of all we introduced theboolean parameter creatingUser. This parameter is needed to trigger thevalidations to be done when the user is created. It is the parameter thatdecides the behavior of the function, and this can be confusing. Martintalks about how to not use flags as parameters, because it is an indicationof the function doing more than one thing[17, p. 41]. We could have usedthis rule as an indication of going in the wrong direction, and divided thefunction into two different functions right away.

4.2.2 Better Names and Abstraction

Even though some of the code in the REST-API is not used by more thanone function we have moved it into separate functions. These functions are

35

given descriptive names, the benefit of this is discussed in section 2.2. Bymoving the code we also keep our functions small and focused on doingone thing. Furthermore, by doing this we also fulfill the criteria of TheStepdown rule[17, p,37]. The rule states that every function should befollowed by a function at the next level of abstraction. The point is to makethe functions read somewhat like articles, explaining their use.

Authorization

The authorization of users, see listing A.5, is a good example of code that isvery easy to extract from the original code and where one can really see thebenefits of doing this. First of all the code is basically just an if statement. Itis very short and it does not do much. It can also be difficult to understandwhy it is there and what it is doing at first glance, especially when it is mixedin with plenty of other statements.

When extracting this logic into a new function we remove noise suchas brackets and logging statements, but more importantly we introduce aname, a name that describes the function. Just by looking at the functioncall one immediately understands why it is there and what it does. Thename also makes the function easier to debug as the name acts as a manual,telling us what the function is supposed to do. The result is listing B.5.

Update User

Let us look at the function updateUser. In the original code this function iscluttered with if statements and exceptions, as seen in listing A.2. Most ofthis is validation of the updated user object, checking whether the birthdayis valid or if the username is in use, and there is also error handling.With all this going on it is difficult to get a clear overview of what thefunction is doing. One have to step through line by line to make surenothing is overlooked. To make this function more in line with the cleancode principles we have extracted more or less all of the logic into severalsmaller functions. All of these functions have one single responsibility anda descriptive name. We already discussed these functions, the authorizefunction throwIfNotAuthorized(listing B.5), getUserByUUID(listing B.4)and the validation of fields validateUpdatedUserFields(listing B.9).

The result is a much tighter and smaller function, see listing B.2. Thisfunction is much easier to read and follow than the original version. Itauthorizes the user, fetches the user from the database, validates the fieldsand then updates the user. This is pretty clear from the function calls thanksto their names. The function is also now easier to debug if something shouldgo wrong. If there is something wrong with the validation, one knowsimmediately where to start looking.

36

4.2.3 Reducing Complexity

Trying to understand a function with a high cyclomatic complexity canbe a mind-bending task. With all the different outcomes it is difficult toget a clear overview of what could potentially happen when one calls thefunction.

The function getUser is a function with many decision points. As we cansee in listing A.3 there are five different if statements, where one consistsof an or expression. This gives a total of six decision points, some of themare nested and some even duplicated. If we use the formula discussed insection 2.4 to calculate the cyclomatic complexity of functions with multipleexit points we see that the function has a score of six. Six decision pointsand two exit points gives us the calculation M = 6 - 2 + 2.

What creates problems in the function is the fact that the parameteruserId can be both UUID and a username. It looks as this functionalitywas implemented at a later stage and the programmer who did it "broke awindow", as we discussed in chapter 1. Instead of refactoring the functionand create a function that tries to acquire a user-object by both usernameand UUID he changes the parameter to be the users UUID, if it is theusername, to work with the existing code which expects the parameter tobe the UUID. He introduced two decision points and it makes the functionmore confusing than it needs to be.

What we did to reduce the complexity of the function was to firstextract all of the logic concerning finding the user-object, both by usernameand UUID. We created the function findUser, see listing B.6, which firsttries to find a user-object by UUID, and then username. By doing thiswe can remove the logic that is changing the userId parameter, and thelogic fetching the user on line 17, after the authorization in the originalgetUser(listing A.3). We are also using the authorize function discussedearlier, listing B.5.

The result is listing B.3. The function itself has a cyclomatic complexityof 1 and the functions it uses has a maximum of 2. Now that the functionis smaller and it uses functions with descriptive names it is easier tounderstand what is going on. As a result, this new version of getUser alsoreads as an article.

4.2.4 Error Handling

Throughout the REST-API there are incredibly many null checks andexceptions thrown. There is also some inconsistency in how the functionshandle the different errors. In getUser, listing A.3, if a user cannot be foundnull is returned, while in updateUser, listing A.2, an exception is thrown

37

instead. Most of the time an exception is thrown, but there is no consistencyin the messages sent with the exceptions.

Many of these null checks can be avoided if we handle the error wherethey occur instead of just returning them. According to the rule "Don’tReturn Null" in Clean Code, a function should not return null. It shoulddeal with the null right away[17, p,110]. The reasoning is that if a functionreturns null the problem is only postponed. One would have to deal with itwhenever the function is used, and this is something that can be forgotten.If it were to be forgotten only once it could lead to severe errors in thesystem. Therefore it is much better to deal with the problem right awayin the function providing the null. So by throwing an exception instead ofreturning null there is a smaller chance that there is introduced errors intothe system.

This is what we have done with getUserByUUID, listing B.4. Afterfetching a user-object we check if it is null. If that is the case we throw anexception instead of returning null. When a user-object is found we returnit. Instead of letting a potential null value bubble upwards we deal with itright away. If we let it pass it is possible that when this function is usedsomewhere else or by someone not familiar with the system they forget todo a null check. Furthermore, as this function is used to retrieve a user-object that is about to be altered or manipulated, there is really no point inreturning null if a user is not found as there is nothing to manipulate. Asmentioned earlier there was inconsistency in how this problem was handledin the system. By doing what we have done we increase the consistency andremove a lot of null checks as every function now uses getUserByUUID.

4.2.5 Responsibility

Responsibility is an important point of Clean Code as we discussed insection 2.2. The single responsibility principle means that a class orfunction should only have one reason to change. There are severalfunctions violating this rule in this system and one of them is theexportActivatedUsers function, listing A.6.

exportActivatedUsers builds a CSV-file with all the relevant informationstored about activated users and other information. The function creates aBufferedWriter, loops through all of the activated users and appends datato the output. This leads to different reasons why this function might haveto change.

• The function would have to change if the information exportedchanges. Say a new variable is added to the user that also should beexported, the function would then have to change to include this in

38

the export. Also, if a variable is not relevant anymore for the export,it would have to change.

• The function depends on several other modules to work, such asBufferedWriter. If the implementation of the BufferedWriter changesthe function would have to change to follow the changes made toBufferedWriter to keep the intentional functionality.

• If the requirement specifications on exporting users were to change,the function would have to change. Say a new requirement is to beable to choose the file type. This would force a change.

To make sure that our new service follows the single responsibilityprinciple we tried to identify different responsibilities and delegate themto the correct object:

• The function uses a while loop and a for loop to loop through 200users at a time. We created a new class, UserPager (listing B.10), andextracted this logic to the class to isolate the responsibility. This classfills an ArrayList with "pages" with 200 users in each. This list can befetched as an iterator so one can iterate page by page of users. If therewere to be changes in how all the activated user-objects are fetched,only this class would have to change.

• There are several lines of append statements in the function, and it isa little difficult to see what is appended and from where. We movedthe responsibility of exporting the data to the object with the relevantdata itself. In the user class we created a function that takes a writeras a parameter and then adds all of the relevant data to this, see listingB.11. If we now add more data to the user that we want to export, weonly have to change the user and not the export function itself.

• Finally we create a new class, a CSVExporter (listing B.12), whoseresponsibility is to handle the exporting and creation of the CSV-files.This class takes the new UserPager iterator we created and iteratesover all the pages and passes the writer to the user-objects so theyappend their information to the writer. If the exporting procedureshould change there is only need to change this class.

4.2.6 Unit Tests

Long, large functions are hard to write unit tests for. The more the functiondoes, the harder it is to test every single thing. This is the case for theoriginal code. The functions are large and difficult to test, therefore theoriginal version did not have any unit tests for the API. This makes it muchharder to change the code. The question "will this break the code?" must beasked for every single change. In short, there is no safety net[17, p,124].

39

Listing B.13 shows a test which would act as a safety net whenever workis done to the function updateUsersPassword, or to what the functiondepends on. The test expects that a ValidationException will be thrownwhen a invalid password is passed to the function. If this does not happenthe test fails.

To refactor code with no safety net is not a very good idea. Without unittests one will not have any control over the changes made. Does the systembehaves as expected, or has the behavior changed? Unit tests are needed tobe able to answer questions such as this throughout the process.

With this in mind our first step in refactoring was to write small unittests testing only one concept at the time. With these tests in place wecould refactor while feeling safe about the changes as long as the tests didnot break. These unit tests were written as clean as possible to keep themreadable.

Much of the necessary mocking is done in the setup of the test suite, sothe tests are mostly asserts with some mocking and setup to suit the needfor the particular test. As with variables and functions naming is important,so the names of the tests describes what the tests are testing and what theexpected result should be.

Listing B.14 shows one of the unit test written to test the functiongetUser. This particular test is written to make sure that the function alsofinds a user when it is passed a username as an argument. The test fulfillsthe rules discussed in section 2.2, both F.I.R.S.T and the "Single ConceptRule". It is short and does what the name says. The dao is a mock. Thismeans that we have complete control over its behavior. We can then tellit to return a dummy user, which is created during setup, whenever thefunction getUserByUsername is called. By doing this we know that getUsershould find a user. If getUser is changed and getUserByUsername is nolonger called the test will fail.

40

Chapter 5

Results

In this chapter we present the errors made by the participants in the secondrun of the experiment, and the results from the two runs of the experimentwe conducted. We discuss the results in chapter 6. We continue withthe naming we established in section 3.5, group A worked on the originalversion, while group B worked on the refactored version.

5.1 Errors

In this section we present the errors made by the participants during thesecond run of the experiment. The run was conducted in the same manneras the first run, with the exception of the changes we made, as discussed insection 3.5.4.

Out of the ten participants half of them made the same error. Inassignment 1 when they are validating the updated values they are notchecking if the username or email has changed. Instead they are assumingthe values has changed and jump straight to controlling if this "new"username or email is in use. If the username has not changed this wouldfail the update as the username is in use by the current user. This is nota major error, and one can justify why they assume every value has beenupdated or why they did not think of it. First of all they do not know thesystem and they cannot run it. Running the system and experiencing howupdating a user works on the frontend gives a great amount of informationneeded when writing the updateUser function. Because of this they have tomake a decision, should it be possible to update a single value at a time, oris every value updated?

Three of the five who made the error was part of group A. One of the twofrom group B lacked more logic than just checking whether the usernameor email had changed. He was not validating anything. This is a vital part

41

of the assignment. Due to this, we claim that the amount for errors inassignment 1 is evenly spread between the groups.

Only two failed to find the bug. One had a wrong answer while the otherdid not have the time to finish. Three participants in the first run of theexperiment also failed to find the bug before they had to leave. All theparticipants who had to leave before they found the bug, worked on therefactored version. What is important to note is that these participantsdid not use more time on the first two assignments than the others. Onthe contrary, some of these participants solved the other assignments withsome of the best times. This is important because it shows that it is not alack of skills making them miss the bug. They spent on average 32 minuteslooking for the bug before they had to leave, while group A participantsfound the bug with an average of 20 minutes.

Other than these errors no other faults were made.

5.2 First Run of the Experiment

The results from the first run of the experiment show that group B finishedwith the best average time on two of the three assignments. We now presentthe results from each assignment.

5.2.1 Assignment 1

Group A’s best time was 42 minutes. They averaged around 49 minutes,while group B averaged at 46 minutes. The first participant to solve theassignment was part of group B. He solved it in 24 minutes, while the nextperson finished 7 minutes later. He was also part of group B.

As we can see from figure 5.4 the participants of group A finished witha more consistent result than the participants of group B. B had a muchlarger spread in the results.

5.2.2 Assignment 2

Assignment 2 had a great span between the groups. Group B finished withan average of 8 minutes, while group A finished with an average of 23minutes. B finished 15 minutes faster than group A. The first participantto finish was from group B, after 5 minutes.

Figure 5.5 shows that the results of group A are once again veryconsistent, but so are group B’s.

42

5.2.3 Assignment 3

This is the assignment with the most notable time difference. Group A hadan average time of 8 minutes while group B had an average of 35 minutes.Group B used 23 more minutes than group A. Two of the best times camefrom group A, with three minutes. The closest participant from group Bfound the bug after 19 minutes.

Unfortunately three of the participants from group B had to quit theexperiment before they could locate the bug. See figure 5.3. They had toleave for personal reasons. We decided to use the time they spent lookingfor the bug before they had to leave as their result, because they had alreadyspent more time than any member of group A.

5.3 Second Run of the Experiment

In the second run of the experiment group A finished two of the threeassignments with the best average time. We will now present the resultsfrom each assignment.

5.3.1 Assignment 1

Group A finished the assignment with an average time of 37 minutes. Thisis 10 minutes faster than group B’s average of 47 minutes. Thanks to thechanges we made to the experiment, see section 3.5.4, we now know thatthis result only reflects the time used to solve the assignment.

Even though group A had the best average time, three of the participantsmade the common error we discussed in section 5.1. In group B one madethe same error as group A, while one made a greater error; not to validateanything. The best error free solution came from group B after 25 minutes.

5.3.2 Assignment 2

Group A completed the assignment with an average time of 21 minutes, 14minutes behind group B’s average time of seven minutes.

Figure 5.12 shows that all members of group B finished in 10 minutes,except one who spent 15 minutes. Group B’s fastest participant used twominutes.

43

5.3.3 Assignment 3

Once again Group B had difficulties with this assignment, despite theactions we discussed in section 3.5.4. Group B finished with an averagetime of 36 minutes, 16 minutes behind A.

Even though group B had a worse average time, one of the best timescame from their group. This participant finished in six minutes, exactly thesame as the best member of group A. One of the participants in group B hadto leave before he could start the assignment, so we have removed him fromthe average time. This means group B’s average time for this assignment iscalculated from one less participant than group A.

44

Figure 5.1: First Run of the Experiment: Table LegendYellow Had to leave before finding the bugGreen The fastest time in the run

Figure 5.2: First Run of the Experiment: Group A DataA B C D Average Time

Assignment 1 64 46 42 43 48,75Assignment 2 21 38 17 17 23,25Assignment 3 3 3 22 4 8,00Total Time 88 87 81 64 80,00

Figure 5.3: First Run of the Experiment: Group B DataA B C D Average Time

Assignment 1 59 72 31 24 46,50Assignment 2 10 5 13 6 8,50Assignment 3 27 28 19 66 35,00Total Time 96 105 63 96 90,00

45

Figure 5.4: First Run of the Experiment: Time used by participants onassignment 1

A B0

20

40

60

80

49

46

64

72

46

59

43

31

42

24

Groups

Tim

ein

min

ute

s

Group A AverageGroup B Average


A B0

20

40

60

80

23

8.5

38

13

21

1017

6

17

5

Groups

Tim

ein

min

ute

s


46


A B0

20

40

60

80

8

35

22

66

4

28

3

27

3

19

Groups

Tim

ein

min

ute

s


Figure 5.7: First Run of the Experiment: Average time pr assignment

1 2 3

10

20

30

40

50 49

23

8

46

8

35

Assignment

Ave

rage

Tim

ein

Min

ute

s

Group AGroup B

47

Figure 5.8: Second Run of the Experiment: Table LegendRed Wrong solutionOrange Minor errors in solutionYellow Had to leave before finding the bugGreen The fastest time in the run

Figure 5.9: Second Run of the Experiment: Group A DataA B C D E Average Time

Assignment 1 40 48 24 21 52 37,00Assignment 2 12 32 26 13 24 21,40Assignment 3 11 6 15 32 53 19,80Total Time 63 86 65 66 111 78,20

Figure 5.10: Second Run of the Experiment: Group B DataA B C D E Average Time

Assignment 1 36 66 50 58 25 47,00Assignment 2 6 6 15 8 2 7,40Assignment 3 6 X 60 25 55 36,50Total Time 48 72 125 91 82 83,60

48

Figure 5.11: Second Run of the Experiment: Time used by participants onassignment 1

A B0

20

40

60

80

37

4752

66

48

58

40

50

24

36

2125

Groups

Tim

ein

min

ute

s



A B0

20

40

60

80

21

7

32

15

26

8

24

613

612

2

Groups

Tim

ein

min

ute

s


49


A B0

20

40

60

80

19.8

36.535

60

32

55

15

25

1166

0

Groups

Tim

ein

min

ute

s


Figure 5.14: Second Run of the Experiment: Average time pr assignment

1 2 3

10

20

30

40

50

37

21 19.8

47

7

36.5

Assignment

Ave

rage

Tim

ein

Min

ute

s

Group AGroup B

50

Chapter 6

Discussion

In chapter 5 we presented the results from the two runs of the experimentwe have conducted. The results from the two runs vary, but theimplementation of the two runs were slightly different as well. This givesus room for discussion and analysis.

In this chapter we go through and discuss the results from eachassignment and the limitations of our experiment. We continue with thenaming we established in section 3.5 for the groups. Group A worked onthe original version, while group B worked on the refactored version.

6.1 Assignment 1

Assignment 1 is the largest assignment, and it focuses on implementing newfunctionality. The goal of the assignment is to see if there is a difference inthe understandability and reusability of the two versions. The assignmenttext given in the experiment was:

"Right now this user-API does not support updating a user. Your firstassignment is to implement this. Remember: The user must be authorized.A users username, email and uuid must all be unique."

Assignment 1 is explained more in detail in section 3.3.1.

6.1.1 First Run of the Experiment

The groups started very even, both finishing assignment 1 on 49 and 46minutes respectively. Group B finished a little faster than group A as wehad expected. We have to keep in mind that these results reflects both the

51

time used to solve the assignment and the time used to find the problemarea. Whether group B was better at finding the correct class or solving theassignment is impossible to say. In other words, we cannot claim that thisresult shows that the refactored version is more understandable. But as theresults are similar, it seems likely that it is at least as understandable as theoriginal version.

6.1.2 Second Run of the Experiment

In the second run, group A had the best average time on assignment 1 with37 minutes. Group B used 10 more minutes to finish, even so, the fastestsolution came from group B, with 25 minutes. We expected B to finish first,as in the first run, but this time we had a different outcome. Unlike thefirst run, this result is only showing the time used to solve the assignment,which gives it more credibility. This result suggests that it is easier to addnew features to the original version.

6.1.3 Discussing Assignment 1

Our experiment had different results for this assignment. The results fromthe first run suggests that the refactored version is more understandable.But as we have discussed earlier, the result reflects both finding the correctclass and solving the assignment. This lowers the quality of the data, and wecannot prove that clean code made the system more understandable withthis data.

In the second run of the experiment we fixed this problem, and now theoriginal version had a better finishing time. This result has more qualityand credibility to it than the results from the first run, as this only reflectsthe time used to solve the assignment.

So why is this? Why did participants finish faster on the version wethought to be more complicated and less understandable?

One reason could be that the participants working on the original codefeel that they do not have to spend time writing good code, due to the stateof the existing code. This makes them write a quick and dirty solution, thatmatches the quality of the code already in the system. Group A also hadmore solutions with errors, which could point to the fact that the solutionsare not as thought through as they should be. This mentality resembles thecharacteristics of the "Broken Window Theory" we mentioned in chapter 1.

This mentality works in this experiment, as the whole point is to solvethe assignment as fast as possible and then never work with the code again.

52

But in a project where the developer works with the same code over monthsor years, these quick and dirty fixes hurt the maintainability of the system.

The refactored version on the other hand shows signs of someone caringabout the code. This might make the participants more reluctant to createa quick and dirty solution, and in the long run this will lead to moremaintainable code. In the refactored version, one wants to make sure thatthere is a function available that could be used to solve the assignment,before creating it. When the code is built as it is with a lot of functionssolving different problems, one would want to use as many as possible socode is not duplicated.

This makes the participants spend more time looking around in thesystem and the class they are working in. They also might care more abouttheir code, putting in some extra minutes to make their code look good.Since we collected the code from the second run of the experiment, we canclearly see that most of the participants in group B tried to write their codein the same manner as the system. In other words, Clean Code encouragesthe participants to write code with higher quality, and more maintainablecode, as we discussed in section 2.5.

Another reason for the participants finishing faster on the originalversion could be experience. Most of our participants were students, andthey might not be experienced with larger systems or Clean Code. Withoutthis experience they could be more accustomed to long functions. If thisis the case, they should be more effective on the original version, whileworking on the refactored version could be challenging. Unfortunately wehave no data on the participants background and experience. If we havehad a questionnaire about background and preferences we could have beenable to tell if this is a factor.

To conclude, our research suggests that Clean Code does not decrease thetime used to implement new significant functionality. It does however leadto code with higher quality as defined in section 2.4. So in larger projectswhere maintainability is important, the increase of implementation timewhen writing Clean Code could be beneficial due to the increase in bothquality and maintainability. While in smaller projects, with tight schedulesand less need for maintainability, Clean Code might not be necessary.

Everyone writes code differently. Developers have their own individualpreferences and styles when writing code. If our participants wereunaccustomed to the style of Clean Code this could also explain why itincreased the time to implement functionality. Since this research focuseson the immediate effect of Clean Code, we cannot know if development timedecreases when a developer becomes more comfortable with Clean Code. Astudy over a longer period could be more suited to research this.

53

6.2 Assignment 2

Assignment 2 changes the focus over to changing current functionality. Theresult of the assignment reflects how changeable and understandable thecode is. The assignment text given in the experiment was:

"Almost every function takes a parameter called "userId". Rightnow the API treats this as a UUID. Your assignment is to change thefunctionality so all of these functions also try to fetch a user by usernameif it fails to find a user by UUID. In other words, after this change "userId"could also be a username."



Group B finished assignment 2 with an average time of 8 minutes. 15minutes faster than A with an average time of 23 minutes. This resultsuggests that Clean Code is easier to change, as we had expected.


Assignment 2 in the second run of the experiment had a similar result asthe first run. Group B finished much faster than A. B spent 7 minutes onaverage, while A spent 21 minutes on average. The difference is almost asbig as the difference in the first run. The fastest solution came from groupB, with 2 minutes.

Group B’s fastest participant must have noticed the findUser functionduring assignment 1 and done a quick search and replace. The result is aswe expected, group B finishing faster than group A.


In this assignment group B already has a function doing exactly what theassignment asks for, see listing B.6. This gives them a slight edge, butnot a considerable one as it should not take group A too long to write it.After group A writes the function solving the assignment they are on equalgrounds, even so the time difference is rather large. This means that theremight be more to why there is such a great gap.

So what is causing this great difference? That group B does not have towrite the function is of course a factor, but what might matter more is that

54

the function is giving away the answer. When they see the function theyimmediately understand what the assignment is asking for, as the functionanswers all of their questions about the assignment. So instead of thinkingabout a solution or how to solve it, they only have to do a search and replace.Group A on the other hand have to read the assignment and comprehendwhat it is actually asking them to do, and then come up with a solution. Thiscould be where time is spent, and group B skips this entirely when they havea solution available.

Another challenge in the original version is to identify where the changeis needed. The way the functions fetch a user is not standardized andit is done in different parts of the functions. This makes changing thefunctions more complicated and time consuming as they have to go througheach function carefully, understand and find where the user is fetched.Sometimes they also have to change some of the surrounding code to solvethe assignment.

In the refactored version on the other hand, this is not a problem. HereClean Code solves all of the problems one might experience in the originalversion. The function clearly shows where a user is fetched because of thename of the function, and because the logic is already separated from therest of the function they do not have to worry about the surrounding code.

Would the result have been different if group B did not have thefindUser function? This would have forced them to spend more time onunderstanding the assignment, and then creating a solution. If we assumethe groups are completely even on skill, this should take equally long. Nextis then to implement the solution into all the functions fetching a user. Aswe discussed earlier, this is much easier in the refactored version and groupB should spend less time on this.

In conclusion, our research suggests that Clean Code decreases the timeused to change current functionality, when the logic needed is available.But as we just discussed, even without the logic Clean Code should decreasethe time.

6.3 Assignment 3

Assignment 3 is about bug fixing. There is no implementation of new code,so the assignment tests how readable and understandable the two versionsare. The assignment text give in the experiment was:

"There is a bug preventing users from being created with the "cre-ateUser" function. Your assignment is to find this bug."


55


In the final assignment group A finished with an average time of 8 minutes,while B spent 35 minutes on average. We were not expecting this majordifference. We were prepared for the groups to be fairly even, but that wasjust not the case.

As we discussed in section 3.5.4 the participants in group B experiencedsome problems with the assignment text, which is one of the reasons theyspent so much more time solving the exercise. The first participant to starton assignment 3 was from group B, and he had only worked for 30 minutes,24 minutes on the first assignment and 6 on the second. But then he spent66 minutes looking for the bug, 30 minutes longer than anyone else, beforehe had to quit. He was one of the participants who misread the assignmentand only looked "inside" createUser, and did not explore the function calls.

This is not the result we expected, but the problems we experienced withthe assignment text could have affected our result.


Up until this assignment both groups have had the best average times onone assignment each. Group A finished assignment one fastest, while groupB finished assignment two first. This assignment proved to be a lot moredifficult and time consuming for group B than we had initially thought afterchanging the assignment text. Group B finished with an average time of 36minutes, 16 minutes behind A.

The fastest solutions came from both groups, with 6 minutes each.This Group B did better than the group B we had in the first run of theexperiment, so changing the assignment text might have made it clearer forthem where to look. Even so, the result is not as we had expected.


In the first run of the experiment the assignment text caused someproblems, and made the results questionable. Many of the participantsworking on the refactored version misunderstood the text and thought thatthere was a bug in just the createUser function. At first they did not explorethe functions called, they only looked inside createUser. This did not leadthem to a correct answer, as the bug is inside one of the functions called.The result mostly reflects the time they spent on this before realizing thatthe bug was not "inside" createUser, and started going through the calls.Unfortunately three of the participants had to leave before finding the bug.

56

They spent the majority of assignment 3 looking at the ordering of thefunction calls.

In the second run, the assignment had a new description and none of theparticipants mentioned that the text was confusing or misleading in anyway. Even so it is fair to suspect that group B spent some time looking,and thinking about the ordering of calls in createUser. This applies to allthe other functions they go through as well. Essentially this means thatgroup B naturally spent more time looking for the bug because they had tounderstand the logic and spot possible logical errors, while also consideringthe order of calls. Group A does not have this problem, because for themcreateUser is only one large function.

Another advantage of looking for the bug in the original code is that allthe logic is inside the function. Because of this, the participants know thebug is somewhere between line X and Y. This allows them to start at line Xand slowly go through each line carefully looking for anything that looks outof place, wrong or something that does not make any sense. Even thoughthe function is rather large, it is not overwhelming when utilizing this kindof approach to solve the problem.

The refactored version of the function on the other hand is nothing butfunction calls. This might give the participants a feeling of being lost,because they do not know where to start. They may also feel they have togo through a lot of code since some functions call other functions bringingthem deeper into the code. Some might lose track of the logic, and getconfused or feel overwhelmed.

In the original code the bug is out in plain view, it is right there in theopen. While in the refactored version it is hiding much deeper into thecode. The participants have to explore two function calls to get to the rightfunction. Even so, one of the best times on this assignment is six minutes,and it was achieved on the refactored version. What we discussed in section6.1.3 about experience and preference of code style also applies to this. Ifthe participants were more accustomed to Clean Code, they would havemore experience with debugging this code style. The fact that one of thefastest solutions was from the refactored version could suggest this.

Debugging a system can be done in many ways, and developers usuallyhave their own ways of doing it. Some might like to use a debugging tool,others rely on running the system, forcing the bug to occur and then readlogs for information. Other approaches might be equally viable. Becausethe participants are not able to run the system, some might be taken outof their comfort zone, and find it difficult to claim that something is a bug.They are however able to run the unit tests created in the refactored version,but this is something no one took advantage of. Some mentioned that sincethey were told not to build the system they did not think of unit tests at all.This was an unfortunate misunderstanding. When we said there was no

57

point in running the system, we did not mean to imply that they could notbuild it.

As we discussed in section 6.1.3, individual preference and experienceaffects this assignment as well. Participants accustomed to Clean Code willfind it easier to locate the bug than the ones who are not. A study over alonger period of time could also be conducted to research this.

If we assume that all the participants were accustomed to Clean Code,our result indicates that Clean Code does not improve the time to findand solve bugs. On the other hand, if we assume the opposite, thatthe participants were unaccustomed to Clean Code, our results suggeststhe same but with practice it could improve the time. As most of ourparticipants were students it is a possibility that they had little experiencewith Clean Code.

6.4 Implications For Practice

Our results suggests that Clean Code does not improve understandability,but what do the results mean? We acquired the results through conductingan experiment, where the time spent on each assignment was measured.The participants were also aware of their time being recorded. In addition,they knew they were never working with this code again. This creates acompetitive environment, where the participants want to finish as fast aspossible. This in turn promotes quick and dirty solutions, and this wasnot our intention. We wanted an environment where the participants feltlike they were working on something important, an environment close to adevelopment project.

Taking this into consideration, we can say that our results mean: Whensolving assignments and the goal is to finish as fast as possible, and neverwork with the code again, it is faster to do so on code resembling the onewe used. That being code with large and long functions.

So perhaps the effect Clean Code has on understandability variesaccording to the size of the project. In smaller, shorter projects the effortneeded to keep the system clean might not be worth it as it already isunderstandable enough. With all the logic in one place, one does not have tounderstand how all the smaller functions, as in Clean Code, work together.Abstraction is also kept at a minimum, making it less complex. The resultsfrom assignment 1, and 3 suggest this.

In larger, lengthier projects, where maintainability is important, theeffect of Clean Code might be more apparent and valuable. Clean Code alsoencourage writing good code, so the quality of the project is kept. So in thelong run, Clean Code could be valuable. Over a longer period developers

58

also become more accustomed to Clean Code which could further increasethe value. The results from assignment 2 suggest this, but more researchwould have to be done.

6.5 Comparing our Result to Previous Study

In section 2.6.3 we presented a newly published study[3], by Erik Ammer-laan, Wim Veninga and Andy Zaidman, very similar to ours. The majordifference is the size of the study. It is larger in every aspect, with largerrefactorings, multiple experiments and more participants. Despite havinga smaller study, we have reached similar conclusions.

When discussing the results from one of the assignments on a smallerrefactoring, they argue that: "The flow of method arguments and returnvalues might have been more difficult to understand than a linear flowin a large method."[3] This is very much in line with what we read fromthe result of assignment 3, where the original code proved to be moreunderstandable.

They further discuss this by suggesting that the habits of the participantscould explain why the original code is more understandable. What lendstheir argument weight is that all of the participants are employed in thesame software company, and work with the original code on a daily basis.In our case we argued that experience with Clean Code and larger systemscould be a factor. Our participants were mostly students and they mightnot be comfortable with larger code bases and the style of Clean Code.

From the results of assignment 1 we experienced that the participantson the refactored version produced solutions with higher quality than theother group. We proposed that Clean Code encourages developers to writecode with higher quality, leading to better maintainability. Erik et al. pointsto similar findings in one of their experiments.

Unlike our study, they acquired data on the benefits of unit tests. Two ofthe participants wrote unit tests to identify the bug in the assignment. Asexpected they quickly found the bug and solved the assignment. It is onlytwo out of thirty participants, but it still shows the power of having testablecode.

Even though our study is smaller we have gathered data suggesting muchof the same as the larger study of Erik et al. This shows that a long termstudy is needed to clearly see the effects of refactoring or Clean Code onunderstandability. A similar study to what we presented in section 2.6.2,where instead the time used on user stories is measured could be ideal.

59

6.6 Limitations

We have identified some limitations to our experiment. Here we presentthese limitations and discuss what they mean for our results.

6.6.1 Quantitative Problem

In section 2.6.1 we mentioned that previous studies on the effect ofrefactoring on code quality suffer from a quantitative problem. Thisexperiment suffers from the same problem.

We have a small number of assignments and only one system. When weonly have three different assignments on one system, we cannot necessarilygeneralize the results. These results only represent Clean Code’s effect onthis particular system.

To avoid this problem we would have had to refactor multiple systemsinstead of just one. Create new assignments for each system, and finallyconduct an experiment for every system refactored. We felt this was toomuch work for a master thesis and decided instead to focus on only onesystem.

6.6.2 The Participants

We had a total of 18 participants, 9 in each group. This low numberis something that can affect the results, as individual skill has a greaterimpact on the average time when the sample size is small. We divided theparticipants randomly to their groups, in an attempt to evenly spread theindividual skills. Judging from our results it looks like this helped, as theparticipants have fairly consistent results within each group. However, allof our participants could be significantly better or worse than the norm.

Furthermore 14 of our 18 participants were students. This means thatour sample mostly represents students, and not all developers in general.Even so, the students had similar results to the four developers whoparticipated.

6.6.3 Is Assignment 2 Fair?

Assignment 2 can be viewed as unfair and that it favors the refactoredversion too much to be used as an assignment. The argument is thatthe refactored version contains a function which does exactly what the

60

assignment asks for. All the participants have to do is to implement theusage of this function.

The original version on the other hand does not have a function solvingthe assignment ready to be used. But the logic required is still present.The only difference is that it is part of a larger function, the getUserfunction(listing A.3). In other words, it is just not that visible in the originalversion compared to the refactored version, where the logic is extracted intoits own function. And this is one of the reasons to follow the Clean Codeprinciples, so that logic can easily be reused and found when needed.

Because of this we believe that the assignment is completely fair and canbe used.

6.6.4 Not a Real Working Environment

The system cannot run. This makes it more difficult for the participantsto understand the system and what the code is trying to do. To see thesystem run, to see what the code is doing helps a lot when working withsomething unfamiliar. As the participants are unable to do this, all theyhave is the code itself. This creates a working environment which is notvery realistic. An environment such as this could make the participants feellike they are out of their comfort zone, and their performance could sufferas a side effect. If this were the case, both groups had the same handicapand it should not affect the results.

61

Chapter 7

Conclusion

The goal of this research was to investigate the effect of Clean Code on codeunderstandability. To determine this, we acquired a system with many ofthe same characteristics as that of legacy code, and code smells. A classin this system was refactored to follow the principles of Clean Code. Wethen conducted an experiment, where two groups of participants solvedassignments on separate code versions. One group worked on the originalcode, the system in its original form, and the other group on the version werefactored.

We continue with the naming we established in section 3.5 for thegroups. Group A worked on the original version, while group B workedon the refactored version.

7.1 Answering the Research Questions

In this section we conclude this thesis. Before we conclude we have toanswer our research-questions. We start by answering the sub-questionsbefore we finish with the main research question.

Will Clean Code improve the time used to implement newsignificant functionality?

According to our results Clean Code does not improve the time used toimplement new significant functionality. The first run of the experimentshowed that Clean Code improved the time, but the result contains noiseand cannot be relied upon to answer the question. The second run on theother hand produced more trustworthy results, which can be used. Group

63

A finished the assignment with an average time of 37 minutes. 10 minutesfaster than group B.

It is these results we use when concluding that Clean Code does notimprove time used. Indeed, it seems to increase it.

Will Clean Code improve the time used to change currentfunctionality?

Our results clearly point to this being true. In both runs of the experiment,group B finished with a much better average time than group A. In run 1,15 minutes faster, and in run 2 14 minutes faster. Clean Code seems toimprove time used when changing current functionality.

Will Clean Code improve the time used to find and solve a bug?

The results indicate this to not be true. Group A finished with a betteraverage time in both runs of the experiment. In run 1 they finished 27minutes faster, and in run 2 16,7 minutes faster. Clean Code does notimprove the time used to find and solve a bug. On the contrary, it seemsto increase it.

Now that we have answered our sub-research questions, we can use theseanswers to answer our main research question.

Will Clean Code improve the time used to solve small codingtasks?

Before starting this research we expected the results to show that CleanCode improves the time used to solve small coding tasks, thus the codebeing more understandable. The reason for these expectations is thatunderstandability is one of the acclaimed benefits of Clean Code. Codefollowing the principles should be self explanatory with small functions,descriptive names and good abstraction.

Despite our expectations, the results from the experiment show that wewere wrong. Only one of three sub-research questions were answered infavor of Clean Code. Both implementing new functionality, and finding andsolving a bug were completed faster on the original system. With this resultwe cannot claim that Clean Code does improve the time used to solve smallcoding tasks. In other words, there seems to be no immediate benefit ofClean Code in form of understandability.

64

Our research points to other benefits than understandability beingachieved, such as maintainability and code quality. The participantsworking with the refactored version in our experiment continued to writecode following the Clean Code principles, maintaining the quality of thecode. Because of this it is possible that with time, as developers becomeaccustomed to Clean Code, the time used on small coding tasks willdecrease.

The size of the system could also have an impact on Clean Code’s effecton understandability. The effect of Clean Code could be more apparent inlarger systems, as there is more code, modules, classes and interfaces tocomprehend and understand. Our research was only done on a small partof a system where this was not something the participants had to consider.

7.2 Future Work

Now that this thesis is concluded we list some thoughts on further work:

• In our research we refactored a class in a system and compared thisto the original. This limited the assignments to just this part, andother parts of the system were not touched by the participants. Itwould be interesting to compare an entire refactored system to theoriginal. This way the assignments could spread over several parts ofthe system. Data from such a research could show if Clean Code ismore valuable in larger systems.

• We discussed how experience with Clean Code can be a factor insection 6.1.3, and a long time study could answer this. Doing the studywe described above over a longer period would give developers time tobecome accustomed to Clean Code. We believe that with experiencedevelopers would be able to utilize the benefits of Clean Code better.A study such as this could also tell us if Clean Code does maintain thecode quality, and the effect Clean Code has on larger, more complexassignments.

65

Appendix A

Original Code

All of the code presented in this appendix is presented with permissionfrom the owner. To reuse or copy the code, permission must be grantedby the owner.

A.1 Create User

Listing A.1: Original: Create User1 public User createUser(@AuthLimit(required = false) Subject subject, User newUser,

@HeaderParam("X−Forwarded−For") String ip) {2 logger.debug(append("method", "createUser"), "subject={}", subject);3

4 checkUserFields(newUser);5

6 String password = null;7

8 if (newUser.getBase().has("password")) {9 password = newUser.getBase().get("password").getAsString();

10 } else {11 logger. info(append("method", "createUser"), "Request without a password");12 throw new ValidationException(new Error("Validation", "ValidationException",13 "Sorry, you are missing a password!"));14 }15

16 Validator.throwIfInvalidPassword(password);17

18 UUID uuidGenerated = UUID.randomUUID();19 String uuid = uuidGenerated.toString();20

21 newUser.setUUID(uuid);22

23 long currentTime = System.currentTimeMillis();24

25 newUser.setCreated(currentTime);26 newUser.setModified(currentTime);27 newUser.setActivated(true);

67

28 newUser.setEmail(newUser.getEmail().toLowerCase());29

30 logger.debug(append("method", "createUser"), "createUserOk subject={}, email={},username={}",

31 new Object[] {subject, newUser.getEmail(), newUser.getUsername()});32

33 if (newUser.getBirthday() != null && newUser.getBirthday().after(new Date())) {34 logger. info(append("method", "createUser"), "Invalid birthday, is larger than

today!");35 throw new ValidationException(new Error("Birthday is not valid!"));36 }37

38 if (dao.getUserByEmail(newUser.getEmail(), false) != null) {39 logger. info(append("method", "createUser"), "Email is already in use!") ;40 throw new ValidationException(new Error("duplicate_unique_property_exists",41 "Unique check failed", "This email is already in use!")) ;42 }43

44 if (dao.getUserByUsername(newUser.getUsername(), false) != null) {45 logger. info(append("method", "createUser"), "Username is already in use!");46 throw new ValidationException(new Error("duplicate_unique_property_exists",47 "Unique check failed", "This username is already in use!")) ;48 }49

50 AuthUser au = new AuthUser();51 au.setUuid(newUser.getUuid());52 au.setCreated(currentTime);53 au.setModified(currentTime);54

55 JsonElement je = newUser.getBase().get("password");56 newUser.getBase().remove("password");57

58 au.updatePassword(je.getAsString());59

60 if (ipLocationLookup != null) {61 ip = IPLocationLookup.splitIPHeader(ip);62 JsonObject locationObject =

LocationMetadata.createLocationMetadata(ipLocationLookup.lookup(ip));63 newUser.addMetadata(LocationMetadata.METADATA_NAME, locationObject);64 }65

66

67 dao.saveAuthUser(au);68 dao.createUser(newUser.getUuid(), newUser);69

70 logger.debug(append("method", "createUser"), "return id={}", newUser.getUuid());71

72 try {73 AuthenticateResponse access = logInUser(newUser, password, false);74 String token = access.getAccessToken();75 newUser.setLoggedInToken(token);76 } catch (Exception e) {77 logger.warn(append("method", "createUser"), "Logging in user after creation

failed !" , e);78 }79

80 newUser.setShowMetadata(MetadataVisibility.OWNER);81 return newUser;82 }

68

One of the largest functions in the API, and a good example of a functiondoing more than one thing.

A.2 Update User

Listing A.2: Original: Update User

1 public User updateUser(@PathParam("userId") String userId, @AuthLimit Subjectsubject, User updated) {

2

3 String uuid = (String)subject.getSession().getAttribute(CouchbaseRealm.USER_UUID_PROP);

4

5 logger.debug(append("method", "updateUser"), "uuid={}, userId={}", uuid, userId);6

7 Validator.throwIfInvalidUUID(userId);8

9 if (updated == null) {10 logger. info(append("method", "updateUser"), "Missing user entity, returning

blank");11 throw new ValidationException(new Error("User entity missing!"));12 }13

14 if (uuid.equals(userId) || subject.hasRole("admin")) {15 User oldUser = dao.getUserByUUID(userId);16

17 if (oldUser == null) {18 logger. info(append("method", "updateUser"), "User trying to edit non−existing

user!");19 throw new ValidationException(new Error("User not found!", "User not found!",20 "User not found!", 404));21 }22

23 User tempUpdated = dao.getUserByUUID(userId);24 tempUpdated.clone(updated);25

26 updated = tempUpdated;27

28 if (updated.getBirthday() != null && updated.getBirthday().after(new Date())) {29 logger. info(append("method", "updateUser"), "Invalid birthday, is larger than

today!");30 throw new ValidationException(new Error("Birthday is not valid!"));31 }32

33 if (!oldUser.getUuid().equals(updated.getUuid())) {34 logger.warn(append("method", "updateUser"), "User trying to change uuid!");35 throw new ValidationException(new Error("Invalid action!"));36 }37

38 if (oldUser.getUsername() == null ||!oldUser.getUsername().equals(updated.getUsername())) {

39 logger.debug(append("method", "updateUser"), "changed username");

69

40

41 User existingUser = dao.getUserByUsername(updated.getUsername(), false);42 if (existingUser != null) {43 logger. info(append("method", "updateUser"), "Username not available!");44 throw new ValidationException(new Error("Requested username is already in

use!"));45 }46 }47

48 logger.debug(append("method", "updateUser"), "uuid={}, userId={}", uuid, userId);49

50 dao.updateUser(updated);51

52 } else {53 throwUnauthorizedException();54 }55

56 updated.setShowMetadata(MetadataVisibility.OWNER);57 return updated;58 }

One of the largest and most complex functions in the API. This is mostlydue to the fact that it does more than just updating a user. It validatesseveral values and authorize before it actually updates the user.

A.3 Get User

Listing A.3: Original: Get User1 public User getUser(@PathParam("userId") String userId, @AuthLimit Subject

subject) {2

3 String uuid = (String)subject.getSession().getAttribute(CouchbaseRealm.USER_UUID_PROP);

4

5 logger.debug(append("method", "getUser"), "uuid={}, userId={}", uuid, userId);6

7 if (!Validator.validUUID(userId)) {8 User u = dao.getUserByUsername(userId, true);9

10 if (u == null)11 Validator.throwIfInvalidUUID(userId);12

13 userId = u.getUuid();14 }15

16 if (uuid.equals(userId) || subject.hasRole("admin")) {17 User user = dao.getUserByUUID(userId);18 logger.debug(append("method", "getUser"), "user by uuid={}", user);19 if (user == null) {20 user = dao.getUserByUsername(userId, true);21 logger.debug(append("method", "getUser"), "user by userId={}", user);22 }23 if (user != null) {

70

24 return user;25 } else {26 logger. info(append("method", "getUser"), "Could not find user with userId={}",

userId);27 }28 } else {29 throwUnauthorizedException();30 }31

32 return null;33 }

A fairly simple function cluttered with if statements and null checks dueto the fact that the parameter userId can either be UUID or username.

A.4 Get a User By UUID

Listing A.4: Original: Get a User By UUID

1 User oldUser = dao.getUserByUUID(userId);2 if (oldUser == null) {3 logger. info(append("method", "updateUser"), "User trying to edit non−existing user!");4 throw new ValidationException(new Error("User not found!", "User not found!",5 "User not found!", 404));6 }

One of the most repeated pieces of code in the API and one of the mosteasy to refactor into its own function. Almost every function in the APIoperates on a user, so fetching one is necessary.

A.5 Authorize User

Listing A.5: Original: Authorize User

1 if (uuid.equals(userId) || subject.hasRole("admin")) {2 //Some logic3 } else {4 throwUnauthorizedException();5 }

The authorization check done in almost every function to control that thecurrent user in session matches the user manipulated, or that the sessionuser is an admin.

71

A.6 Export Users

Listing A.6: Original: Export Users

1 public Response exportActivatedUsers(@AuthLimit(roles = "admin") Subject subject) {2 StreamingOutput stream = new StreamingOutput() {3 @Override4 public void write(OutputStream os) throws IOException,

WebApplicationException {5 Writer writer = new BufferedWriter(new OutputStreamWriter(os));6

7 writer8 .append("Full name;Username;Email;Org;Primary usage;Created;Last use

(builder);games created;games favourited;games shared;games beenfavourited;games been shared;games been played;Players on usersgames;Answers on users games\n");

9

10

11 DateFormat df = new SimpleDateFormat("yyyy−MM−dd");12

13 UserList users = null;14

15 boolean finished = false ;16 String lastCursor = null;17

18 while (users == null || !finished) {19 users = dao.getAllUsers(200, lastCursor, false );20

21 lastCursor = users.getCursor();22

23 for (User u : users.getUsers()) {24 String name = u.getName();25 if (name != null)26 writer.append(u.getName().replace(’;’, ’ ’ )) ;27

28 writer.append(’;’) ;29 writer.append(u.getUsername());30 writer.append(’;’) ;31 writer.append(u.getEmail());32 writer.append(’;’) ;33 String org = u.getOrganisation();34 if (org != null)35 writer.append(u.getOrganisation().replace(’"’ , ’ ’ )) ;36 writer.append(’;’) ;37 writer.append(u.getPrimaryUsage());38 writer.append(’;’) ;39

40 Date created = new Date(u.getCreated());41 writer.append(df.format(created));42 writer.append(’;’) ;43

44 Date modified = new Date(u.getModified());45 writer.append(df.format(modified));46 writer.append(’;’) ;47

48 GameUserAnalytics gua = dao.getAnalyticsForUser(u.getUuid(), null);49

50 writer.append("" + gua.getMyGames());

72

51 writer.append(’;’) ;52 writer.append("" + gua.getMyFavouritedGames());53 writer.append(’;’) ;54 writer.append("" + gua.getMySharedGames());55 writer.append(’;’) ;56

57 writer.append("" + gua.getMyGamesFavourites());58 writer.append(’;’) ;59 writer.append("" + gua.getMyGamesShares());60 writer.append(’;’) ;61

62 writer.append("" + gua.getPlays());63 writer.append(’;’) ;64 writer.append("" + gua.getPlayers());65 writer.append(’;’) ;66 writer.append("" + gua.getAnswers());67 writer.append("\n");68 }69

70 if (lastCursor == null) {71 finished = true;72 }73 }74

75 writer. flush() ;76 }77 };78 return Response.ok(stream).type("text/csv")79 .header("Content−Disposition", "attachment; filename=\"users.csv\"").build();80 }

A function used to export users. The function builds a CSV file with allactivated users and other data connected to them.

73

Appendix B

Refactored Code

All of the code presented in this appendix is presented with permissionfrom the owner. To reuse or copy the code, permission must be grantedby the owner.

B.1 Create User

Listing B.1: Refactored: Create User

1 public User createUser(User newUser, String ip) {2 String password = newUser.getPassword();3

4 validateUserFields(newUser);5 setUserFields(newUser, ip);6 AuthUser authUser = createAuthUser(newUser);7

8 dao.saveAuthUser(authUser);9 dao.saveUser(newUser);

10

11 firstTimeLogin(newUser, password);12 return newUser;13 }

This is the result after extracting almost all of the logic in the originalcreateUser, listing A.1. The result is a much smaller, tidier function, clearlyshowing the process of creating a user.

75

B.2 Update User

Listing B.2: Refactored: Update User

1 public User updateUser(String userId, AuthorisableUser authorisableUser, UserupdatedUser) {

2 String uuid = authorisableUser.getUserUuid();3 logger.debug(append("method", "updateUser"), "uuid={}, userId={}", uuid, userId);4 throwIfNotAuthorized(uuid, userId, authorisableUser);5

6 User oldUser = getUserByUUID(userId);7 validateUpdatedUserFields(oldUser, updatedUser);8

9 oldUser.clone(updatedUser);10 dao.updateUser(oldUser);11

12 oldUser.setShowMetadata(GenericBaseWithMetadata.MetadataVisibility.OWNER);13 return oldUser;14 }

After extracting almost all of the logic in the updateUser function this isthe result. The original code is listed in listing A.2.

B.3 Get User

Listing B.3: Refactored: Get User

1 public User getUser(String userId, AuthorisableUser authorisableUser) {2 String uuid = authorisableUser.getUserUuid();3 logger.debug(append("method", "getUser"), "uuid={}, userId={}", uuid, userId);4 User user = findUser(userId);5

6 throwIfNotAuthorized(uuid, user.getUuid(), authorisableUser);7

8 return user;9 }

When both the authorization and finding a user is extracted into theirown functions this is what is left of getUser.

76

B.4 Get a User By UUID

Listing B.4: Refactored: Get a User By UUID

1 private User getUserByUUID(String userId) {2 User user = dao.getUserByUUID(userId);3

4 if (user == null) {5 logger. info(append("method", "getUserByUUID"), "Could not find user with

userId={}", userId);6 Validator.throwIfInvalidUUID(userId);7 throw new ValidationException(new Error("User not found!", "User not found!",8 "User not found!", 404));9 }

10

11 logger.debug(append("method", "getUserByUUID"), "user by uuid={}", userId);12 return user;13 }

This is the result of extracting the much used logic of trying to fetch auser-object from the database. The original is listing A.4. This refactoredversion is a simple method, with a null check to see if a user was found. Ifno user was found the function throws an exception instead of returningnull.

B.5 Authorize User

Listing B.5: Refactored: Authorize User

1 private void throwIfNotAuthorized(String sessionUUID, String userUUID,AuthorisableUser authorisableUser) {

2 if (sessionUUID.equals(userUUID) || authorisableUser.isAdmin()) {3 logger.debug(append("method", "throwIfNotAuthorized"), "userId={} is

authorized", userUUID);4 } else {5 logger.debug(append("method", "throwIfNotAuthorized"), "userId={} is NOT

authorized", userUUID);6 throw new AuthenticationException(7 "You are not authenticated or missing privileges , please try again!") ;8 }9 }

Fetching a user-object and authorization is done in almost everyfunction. This is the result of extracting the authorization into a seperatefunction.

77

B.6 Find User

Listing B.6: Refactored: Find User

1 private User findUser(String userId) {2 User user;3

4 try {5 user = getUserByUUID(userId);6 } catch (ValidationException e) {7 user = getUserByUsername(userId);8 }9 return user;

10 }

The function getUser, listing A.3, tries to find a user by both UUID andusername. We extracted this logic into its own function. This functiontakes advantage of the exception thrown if a user is not found by UUID.This exception is caught and the function tries to find a user by usernameinstead. If this also fails the exception bubbles upward.

B.7 Mixed Validation

Listing B.7: Refactored: Mixed Validation

1 private void validateUserFields(User newUser, boolean creatingUser) {2 logger.debug(append("method", "checkUserFields"), "user={}, creatingUser={}",

newUser, creatingUser);3

4 boolean changingUsername = true;5 boolean changingEmail = true;6 User oldUser = null;7 if (!creatingUser) {8 oldUser = findUser(newUser.getUuid());9 changingUsername = !newUser.getUsername().equals(oldUser.getUsername());

10 changingEmail = !newUser.getEmail().equals(oldUser.getEmail());11 }12

13 Validator.throwIfInvalidEmail(newUser.getEmail());14 Validator.throwIfInvalidUsername(newUser.getUsername());15

16 if (creatingUser) {17 if (newUser.getPassword() == null) {18 logger. info(append("method", "validateUserFields"), "Request without a

password");19 throw new ValidationException(new Error("Validation", "ValidationException",20 "Sorry, you are missing a password!"));21 }22 Validator.throwIfInvalidPassword(newUser.getPassword());23 }24

25 if (!creatingUser && !oldUser.getUuid().equals(newUser.getUuid())) {

78

26 logger.warn(append("method", "validateUpdatedUserFields"), "User trying to changeuuid!");

27 throw new ValidationException(new Error("Invalid action!"));28 }29

30 if (changingUsername && dao.getUserByUsername(newUser.getUsername(), false) !=null) {

31 logger. info(append("method", "validateUpdatedUserFields"), "Username notavailable!");

32 throw new ValidationException(new Error("Requested username is already in use!"));33 }34

35 if (changingEmail && dao.getUserByEmail(newUser.getEmail(), false) != null) {36 logger. info(append("method", "validateUpdatedUserFields"), "Email not available!");37 throw new ValidationException(new Error("Requested email is already in use!"));38 }39

40 if (newUser.getBirthday() != null && newUser.getBirthday().after(new Date())) {41 logger. info(append("method", "validateUpdatedUserFields"), "Invalid birthday, is

larger than today!");42 throw new ValidationException(new Error("Birthday is not valid!"));43 }44 }

This is our first attempt at refactoring the validation of fields when a useris updated or created. It is an attempt at reducing duplication as much aspossible. Therefore it needs the boolean parameter creatingUser to decideits behavior. In the end we deemed the function too complex and decidedto split it into two separate functions.

B.8 Validate User Fields After Creation

Listing B.8: Refactored: Validate User Fields After Creation1 private void validateUserFields(User user) {2 if (user. getTitle () != null)3 Validator.throwIfInvalidTitle(user. getTitle ()) ;4

5 if (user.getPassword() == null) {6 logger. info(append("method", "validateUserFields"), "Request without a

password");7 throw new ValidationException(new

no.cleanCoders.rest.resources.entities.Error("Validation","ValidationException",

8 "Sorry, you are missing a password!"));9 }

10

11 Validator.throwIfInvalidPassword(user.getPassword());12 Validator.throwIfInvalidBirthday(user.getBirthday());13 throwIfEmailInUse(user.getEmail());14 throwIfUsernameInUse(user.getUsername());15 }

79

One of the two functions we created for validation of fields. This is theone used after a user has been created.

B.9 Validate User Fields After Update

Listing B.9: Refactored: Validate User Fields After Update1 private void validateUpdatedUserFields(User oldUser, User updatedUser) {2 boolean changingUsername =

!updatedUser.getUsername().equals(oldUser.getUsername());3 boolean changingEmail = !updatedUser.getEmail().equals(oldUser.getEmail());4

5 if (!oldUser.getUuid().equals(updatedUser.getUuid())) {6 logger.warn(append("method", "validateUpdatedUserFields"), "User trying to

change uuid!");7 throw new ValidationException(new Error("Invalid action!"));8 }9

10 if (changingUsername) {11 throwIfUsernameInUse(updatedUser.getUsername());12 }13

14 if (changingEmail) {15 throwIfEmailInUse(updatedUser.getEmail());16 }17

18 Validator.throwIfInvalidBirthday(updatedUser.getBirthday());19 }

One of the two functions we created for validation of fields. This is theone used when a user has been updated.

B.10 User Pager

Listing B.10: Refactored: User Pager1 public class UserPager {2 public List<UserList> pages;3

4 public UserPager(DataAccessInterface dao) {5 pages = new ArrayList<UserList>();6 loadPages(dao);7 }8

9 public void loadPages(DataAccessInterface dao) {10 boolean finished = false ;11 String lastCursor = null;12 UserList users;13

14 while(!finished) {15 users = dao.getAllUsers(200, lastCursor, false );

80

16 lastCursor = users.getCursor();17 pages.add(users);18

19 if (lastCursor == null) {20 finished = true;21 }22 }23 }24

25 public Iterator<UserList> getPageIterator() {26 return pages. iterator () ;27 }28 }

We created this class to separate responsibility from the exportActicate-dUsers function and other functions looping over users in the database.The function loadPages, loads users into an ArrayList and returns this asan iterator.

B.11 Users Exporting Own Data

Listing B.11: Refactored: Users Exporting Own Data

1 public String exportUserData(Writer writer) throws IOException{2 DateFormat dateFormat = new SimpleDateFormat("yyyy−MM−dd");3 String data = "";4

5 if (getName() != null) {6 writer.append(getName().replace(’;’, ’ ’ )) ;7 }8 writer.append(";");9 writer.append(getUsername());

10 writer.append(";");11 writer.append(getEmail());12 writer.append(";");13

14 if (getOrganisation() != null) {15 writer.append(getOrganisation().replace(’"’, ’ ’ )) ;16 }17 writer.append(";");18 writer.append(getPrimaryUsage());19 writer.append(";");20 writer.append("" + getCreated());21 writer.append(";");22 writer.append(dateFormat.format(getModified()));23 writer.append(";");24 return data;25 }

To separate more responsibility from the exportActivatedUsers functionwe moved the writing of data to the user-object.

81

B.12 CSV-Exporter

Listing B.12: Refactored: CSVExporter

1 public class CSVExporter {2

3 DataAccessInterface dao;4

5 public CSVExporter(DataAccessInterface dao) {6 this .dao = dao;7 }8

9 public void exportActivatedUsersToWriter(Writer writer, Iterator<UserList>userPageIterator) throws IOException {

10 writer.append("Full name;Username;Email;Org;Primary usage;Created;Last use(builder);Games created;Games favourited;Games shared;Games beenfavourited;Games been shared;Games been played;Players on usersGames;Answers on users Games\n");

11

12 UserList users;13 while (userPageIterator.hasNext()) {14 users = userPageIterator.next();15

16 for (User user : users.getUsers()) {17 user.exportUserData(writer);18 GameUserAnalytics gameUserAnalytics =

dao.getAnalyticsForUser(user.getUuid(), null);19 gameUserAnalytics.exportData(writer);20 writer.append("\n");21 }22 }23 }24 }

We moved the entire responsibility of exporting data to a new class wecalled CSVExporter. This class can be expanded to export other data thanjust user-data to CSV.

B.13 Test UpdatePassword

Listing B.13: Refactored: Test UpdatePassword

1 @test(expected = validationexception.class)2 public void testUpdatePasswordThrowInvalidNewPassword() {3 password.setnewpassword("1");4 userservice.updateUsersPassword(dummyuser.getuuid(), authorisableuser,

password);5 }

82

One of the unit test we wrote to test the new updatePassword function.This particular test is to make sure that users cannot update their passwordto an invalid password.

B.14 Test getUserWithUsername

Listing B.14: Test getUserWithUsername@Testpublic void testGetUserWithUsername() {

when(dao.getUserByUsername(dummyUser.getUsername(),true)).thenReturn(dummyUser);

User user = userService.getUser(dummyUser.getUsername(), authorisableUser);assert(user.getUsername().equals(dummyUser.getUsername()));

}

One of the unit tests we wrote when creating the getUserByUsernamefunction.

83

Bibliography

[1] "Legacy". Def. 1.1. Oxford Dictionaries. URL: http : / / www .oxforddictionaries . com / definition / english / legacy (visited on05/01/2016).

[2] Mohammad Alshayeb. ‘Empirical Investigation of Refactoring Effecton Software Quality’. In: Inf. Softw. Technol. 51.9 (Sept. 2009),pp. 1319–1326. ISSN: 0950-5849. DOI: 10.1016/j.infsof.2009.04.002.

[3] Erik Ammerlaan, Wim Veninga and Andy Zaidman. ‘Old habitsdie hard: Why refactoring for understandability does not giveimmediate benefits’. In: 2015 IEEE 22nd International Conferenceon Software Analysis, Evolution, and Reengineering (SANER).Mar. 2015, pp. 504–507. DOI: 10.1109/SANER.2015.7081865.

[4] Robert Baggen et al. ‘Standardized Code Quality Benchmarking forImproving Software Maintainability’. In: Software Quality Control20.2 (June 2012).

[5] Barry W. Boehm, John R. Brown and Myron Lipow. ‘QuantitativeEvaluation of Software Quality’. In: Proceedings of the 2Nd Inter-national Conference on Software Engineering. ICSE ’76. San Fran-cisco, California, USA: IEEE Computer Society Press, 1976, pp. 592–605.

[6] Kenneth S. Bordens and Bruce B. Abbott. Research Design andMethods: A Process Approach. 8th ed. New York, NY, USA:McGraw-Hill, 2011.

[7] Jürgen Börstler, Michael E. Caspersen and Marie Nordström.Beauty and the Beast–Toward a Measurement Framework for Ex-ample Program Quality. Tech. rep. 06.31. Umeå University, Com-puting Science, 2007.

[8] Jitender Kumar Chhabra, Krishan Aggarwal and Yogesh Singh.‘Code and data spatial complexity: two important software under-standability measures’. In: Information and Software Technology45.8 (2003), pp. 539–546. ISSN: 0950-5849.

[9] Bart Du Bois, Serge Demeyer and Jan Verelst. ‘Refactoring -improving coupling and cohesion of existing code’. In: ReverseEngineering, 2004. Proceedings. 11th Working Conference on. Nov.2004, pp. 144–151. DOI: 10.1109/WCRE.2004.33.

[10] Michael Feathers. Working Effectively with Legacy Code. UpperSaddle River, NJ, USA: Prentice Hall PTR, 2004. ISBN: 0131177052.

85

[11] Martin Fowler. Refactoring: Improving the Design of ExistingCode. Boston, MA, USA: Addison-Wesley Longman Publishing Co.,Inc., 1999. ISBN: 0-201-48567-2.

[12] Andrew Hunt and David Thomas. The Pragmatic Programmer:From Journeyman to Master. Boston, MA, USA: Addison-WesleyLongman Publishing Co., Inc., 1999. ISBN: 0-201-61622-X.

[13] ‘ISO/IEC 25010:2011 - System and Software Quality Requirementsand Evaluation - System and software quality models’. In: Geneva,Switzerland: International Organization for Standardization, 2011.

[14] Sandeepa Kannangara and W. M. J. I. Wijayanayake. ‘An EmpiricalEvaluation of Impact of Refactoring On Internal and ExternalMeasures of Code Quality’. In: CoRR abs/1502.03526 (2015).

[15] Kari Laitinen. ‘Estimating Understandability of Software Docu-ments’. In: SIGSOFT Softw. Eng. Notes 21.4 (July 1996), pp. 81–92.ISSN: 0163-5948. DOI: 10.1145/232069.232092.

[16] Jin-Cherng Lin and Kuo-Chiang Wu. ‘A Model for MeasuringSoftware Understandability’. In: CIT ’06 (2006), pp. 192–. DOI: 10.1109/CIT.2006.13.

[17] Robert C. Martin. Clean Code: A Handbook of Agile SoftwareCraftsmanship. 1st ed. Upper Saddle River, NJ, USA: Prentice HallPTR, 2008. ISBN: 0132350882, 9780132350884.

[18] Thomas J. McCabe. ‘A Complexity Measure’. In: Software Engineer-ing, IEEE Transactions on SE-2.4 (Dec. 1976), pp. 308–320. ISSN:0098-5589. DOI: 10.1109/TSE.1976.233837.

[19] Raimund Moser et al. ‘Balancing Agility and Formalism in SoftwareEngineering’. In: ed. by Bertrand Meyer, Jerzy R. Nawrocki andBartosz Walter. Berlin, Heidelberg: Springer-Verlag, 2008. Chap. ACase Study on the Impact of Refactoring on Quality and Productivityin an Agile Team, pp. 252–266. ISBN: 978-3-540-85278-0. DOI: 10.1007/978-3-540-85279-7_20.

[20] Kazuyuki Shima, Yasuhiro Takemura and Kenichi Matsumoto. ‘AnApproach to Experimental Evaluation of Software Understandabil-ity’. In: Proceedings of the 2002 International Symposium on Em-pirical Software Engineering. ISESE ’02. Washington, DC, USA:IEEE Computer Society, 2002, pp. 48–. ISBN: 0-7695-1796-X.

[21] Konstantinos Stroggylos and Diomidis Spinellis. ‘Refactoring–DoesIt Improve Software Quality?’ In: Proceedings of the 5th Interna-tional Workshop on Software Quality. WoSQ ’07. Washington, DC,USA: IEEE Computer Society, 2007, pp. 10–. ISBN: 0-7695-2959-3.DOI: 10.1109/WOSQ.2007.11.

[22] K. Usha, N. Poonguzhali and E. Kavitha. ‘A quantitative approach forevaluating the effectiveness of refactoring in software developmentprocess’. In: Methods and Models in Computer Science, 2009.ICM2CS 2009. Proceeding of International Conference on. Dec.2009, pp. 1–7. DOI: 10.1109/ICM2CS.2009.5397935.

86

effects of clean code on understandability

Documents