Book cover

Buy on Leanpub (pdf and epub)

To report errors or typos, use this form.

Home | Dark Mode | Cite

Software Engineering: A Modern Approach

Marco Tulio Valente

9 Refactoring

I’m not a great programmer; I’m just a good programmer with great habits. – Kent Beck

This chapter begins with an introduction to refactorings, which are modifications made to a program to make it easier to understand and modify. Section 9.2 presents a series of refactoring operations, including code examples, some of which are derived from real refactorings performed on open-source projects. In Section 9.3, we discuss various aspects of the practice of refactoring, including the importance of having a robust unit test suite. Section 9.4 explores the support provided by IDEs for automated refactoring. Finally, Section 9.5 introduces a list of code smells, which are indicators that a particular code structure may benefit from refactoring.

9.1 Introduction

As discussed in the previous chapter, software, like any engineering product, needs to be tested. Similarly, software also requires maintenance. In fact, as mentioned in Chapter 1, there are various types of maintenance that can be performed on software systems. For instance, when a bug is detected, we must apply corrective maintenance. When users or the Product Owner request a new feature, we need to conduct evolutionary maintenance. When a business rule or a technology used by the system changes, we must allocate time to perform adaptive maintenance.

Furthermore, software systems, like living beings, also age. In the early 1970s, Meir Lehman, while working at IBM, studied this phenomenon and subsequently formulated a set of empirical laws concerning aging, internal quality, and the evolution of software systems, which became known as the Software Evolution Laws or simply Lehman’s Laws. The first two laws formulated by Lehman were:

Lehman’s first law justifies the need for adaptive and evolutionary maintenance. It also indicates that systems can die, meaning there may come a time when it is more advantageous to replace a system with a new one. The second law states that maintenance makes the code and internal structure of a system more complex and harder to maintain over time. In other words, a system’s internal quality naturally deteriorates as it undergoes maintenance and evolution. However, this law also includes a caveat: this deterioration can be prevented if specific work is done to maintain internal quality. Today, this work is called refactoring.

Refactorings are code transformations that improve a system’s maintainability without affecting its execution. To explain this definition, let’s divide it into three parts. First, when the definition mentions code transformations, it refers to modifications in the code, such as splitting a function into two, renaming a variable, moving a function to another class, or extracting an interface from a class, among others. Next, the definition mentions the goal of such transformations: to improve the maintainability of the system, which means to improve its modularity, design, or architecture; enhance its testability; and make the code more readable and easier to understand and modify, among other benefits. Finally, a restriction is set: it is not worth improving maintainability at the cost of changing the program’s results. In other words, refactoring must ensure that the system works exactly as it did before the transformation. Alternatively, we can say that refactorings must preserve the system’s behavior.

However, in the 1970s and 1980s when Lehman’s Laws were formulated, the term refactoring was not yet used. One of the first uses of this term occurred in 1992, in a doctoral thesis defended by William Opdyke at the University of Illinois (link). In 1999, refactoring, now known by this name, was included among the programming practices advocated by Extreme Programming. In the first edition of the XP book, it was recommended that developers perform refactorings with the goal of restructuring their systems to remove duplication, improve communication among developers, and simplify or make the design more flexible.

In 2000, Martin Fowler released the first edition of a book dedicated to refactoring, which was very successful and contributed to popularizing this practice. One of the reasons for this success was that the book presented a catalog with dozens of refactorings. In a style resembling a catalog of design patterns (as studied in Chapter 6), the presentation of the refactorings started by naming them, thus helping to create a vocabulary for refactoring. Fowler also presented the mechanics of each refactoring, included code examples, and discussed the benefits and potential drawbacks.

In his book, Fowler mentions Kent Beck’s phrase that opens this chapter and emphasizes the importance of following good programming habits, which are essential for maintaining a program’s health and ensuring it continues to provide value over the years. As a result, developers should perform not only corrective, adaptive, and evolutionary maintenance but also frequent refactorings to ensure the system’s maintainability.

Note: The term refactoring has become common in software engineering. Therefore, it is sometimes used to indicate improvements in other non-functional requirements, besides maintainability. For example, we often hear developers say they will refactor their code to improve performance, introduce concurrency, or improve the usability of the interface. However, in this book, we use the term in its original, restricted sense—referring only to modifications that improve the maintainability of the source code.

9.2 Examples of Refactorings

In this section, we will present some widely used refactorings. For each refactoring, we will present its name, motivation, and implementation mechanics. Furthermore, we will illustrate the presentation with examples from real refactorings performed by open-source developers.

9.2.1 Extract Method

Extract Method is one of the most important refactorings, as it aims to extract a piece of code from method f and move it to a new method g. Subsequently, f will include a call to g. The following example illustrates this refactoring. First, here’s the code before the refactoring:

void f() {
  ... // A
  ... // B 
  ... // C
}

And here’s the code after the extraction of method g:

void g() {  // extracted method
  ... // B
}

void f() {
  ...  // A
  g();
  ...  // C
}

There are also variations in the extraction mechanics. For instance, it is possible to extract multiple methods, denoted as g1, g2, …, gn, from a single method f all at once. Additionally, we can extract the same code g from multiple methods, denoted as f1, f2, …, fn. In this case, the extraction is used to eliminate code duplication, since the code in g is repeated in several methods.

To perform an Extract Method refactoring, we may need to pass parameters to the new method. This is necessary, for example, if the extracted method uses local variables from the original method. Additionally, the extracted method should return values if they are used later by the original method. As a last example, if local variables are only used in the extracted code, we should extract their declarations as well.

Extract Method is often referred to as the Swiss Army knife of refactorings due to its versatility and wide range of applications. For instance, it can be employed to break down a large method into smaller ones, each serving a specific goal that is reflected in its name. This approach also enhances the comprehensibility of the original method, as its body after the refactoring consists of only a sequence of calls. An example is provided below and other applications of Extract Method are discussed in a following section.

Example: Here, we provide a real example of Extract Method from an Android system. This system has an onCreate method, which uses SQL commands to create the database tables used by the system. The code for the initial version of onCreate—after a few edits, simplifications, and removal of comments—is presented below. The original method exceeded 200 lines.

void onCreate(SQLiteDatabase database) {// before extraction
  // creates table 1
  database.execSQL("CREATE TABLE " +
            CELL_SIGNAL_TABLE + " (" + COLUMN_ID +
            " INTEGER PRIMARY KEY AUTOINCREMENT, " + ...
  database.execSQL("CREATE INDEX cellID_index ON " + ...);
  database.execSQL("CREATE INDEX cellID_timestamp ON " +...);

  // creates table 2
  String SMS_DATABASE_CREATE = "CREATE TABLE " + 
            SILENT_SMS_TABLE + " (" + COLUMN_ID +
            " INTEGER PRIMARY KEY AUTOINCREMENT, " + ...
  database.execSQL(SMS_DATABASE_CREATE);
  String ZeroSMS = "INSERT INTO " + SILENT_SMS_TABLE + 
            " (Address,Display,Class,ServiceCtr,Message) " +
            "VALUES ('"+ ...
  database.execSQL(ZeroSMS);

  // creates table 3
  String LOC_DATABASE_CREATE = "CREATE TABLE " +      
            LOCATION_TABLE + " (" + COLUMN_ID +
            " INTEGER PRIMARY KEY AUTOINCREMENT, " + ...
  database.execSQL(LOC_DATABASE_CREATE);

  // more than 200 lines, creating other tables
}

To enhance this method’s clarity, a developer decided to perform seven Extract Method operations. Each extracted method became responsible for creating one of the tables. By examining the names of these methods, we can easily infer the tables they create. After these extractions, the onCreate method only calls the extracted methods, as shown below. The method’s size has been reduced from over 200 lines to just seven lines. It’s also worth noting that each extracted method has a parameter representing the database where the tables should be created.

public void onCreate(SQLiteDatabase database) { 
  createCellSignalTable(database);
  createSilentSmsTable(database);
  createLocationTable(database);
  createCellTable(database);
  createOpenCellIDTable(database);
  createDefaultMCCTable(database);
  createEventLogTable(database);
 }

Extract Method Motivations

In 2016, along with Danilo Silva and Prof. Nikolaos Tsantalis from Concordia University in Montreal, we conducted a study with GitHub developers to uncover their motivations for performing refactorings (link). In this study, we identified 11 distinct motivations for method extractions, as shown in the following table.

Motivations for Extract Method Occurrences
Extracting a method that will be called by other methods 43
Introducing an alternative signature for an existing method 25
Breaking a method into smaller parts to improve understanding 21
Extracting a method to remove code duplication 15
Extracting a method to facilitate the implementation of a new feature or to fix a bug 14
Extracting a method with a better name or fewer parameters 6
Extracting a method to test its code 6
Extracting a method that will be overridden by subclasses 4
Extracting a method to allow recursive calls 2
Extracting a method that will work as a factory 1
Extracting a method that will run on a thread 1

Analyzing this table, we can see that the primary motivation is to extract code for reuse. Developers often come across a piece of code they need, only to find it is already implemented within a method f. To make this code accessible, they perform an Extract Method, creating a new method g. This allows them to call g in the code they are working on. Here’s an example of how one developer reported this motivation for extracting a method:

These refactorings were made because of code reusability. I needed to use the same code in a new method. I always try to reuse code, because when there’s a lot of code redundancy it gets overwhelmingly more complicated to work with the code in future, because when something changes in code that is duplicated somewhere, it usually needs to be changed there also.

As seen in the previous table, the second most frequent motivation is introducing an alternative signature for a method. An example is shown below, involving a method that logs a string to a file. We first show the code before the refactoring:

void log(String msg) {
  // saves msg to file
}

The developer decided to provide an alternative version of this method with a boolean parameter, indicating whether the string should also be printed on the console. Here is the code after extracting this second log method:

void log(String msg, boolean console) {
  if (console)
     System.out.println(msg);
  // saves msg to a file
}

void log(String msg) {
  log(msg, false);
}

It’s worth noting that, technically, this transformation is an Extract Method. Initially, the log(String) code was extracted to log(String, boolean), creating a new method. Subsequently, minor adjustments were made to the extracted code, including the addition of an if statement. Such adjustments are common and do not disqualify the transformation as a method extraction.

9.2.2 Inline Method

This refactoring operates in the opposite direction of method extraction. Consider a method with only one or two lines of code that is rarely called. Thus, the benefits offered by this method, both in terms of reuse and organization of the code, are minimal. Consequently, it can be removed from the system, and its body can be incorporated into the call sites. It’s important to note that Inline Method is a less common and less significant operation compared to Extract Method.

Example: The following code shows an example of Inline Method. First, here is the code before inlining. We can observe that the writeContentToFile method has a single line of code and is called only once by the write method.

private void writeContentToFile(final byte[] revision) {
  getVirtualFile().setBinaryContent(revision);
}
 
private void write(byte[] revision) {
  VirtualFile virtualFile = getVirtualFile();
  ...
  if (document == null) {
     writeContentToFile(revision);
  }
   ...
 }

The developers decided to remove the writeContentToFile method and incorporate its body into the single call site. Here is the code after the refactoring:

private void write(byte[] revision) {
  VirtualFile virtualFile = getVirtualFile();
  ...
  if (document == null) {
     virtualFile.setBinaryContent(revision); // after inline
  }
  ...
}

9.2.3 Move Method

It is not uncommon to find a method implemented in the wrong class. This occurs when a method f is located in a class A, but it uses more services from class B. For instance, it may have more dependencies on elements of B than on elements of its original class. In such cases, moving f to B should be considered. This refactoring can improve the cohesion of A, reduce the coupling between A and B, and ultimately make both classes easier to understand and modify.

Due to these characteristics, Move Method is one of the refactorings with the highest potential to improve modularity. Since its effects extend beyond a single class, this refactoring can have a positive impact on the system’s architecture, ensuring that methods are placed in the appropriate classes, from both functional and architectural viewpoints.

Example: Consider the code of an IDE. In this code, we have a class called PlatformTestUtil, which is responsible for a specific feature provided by the IDE. The details of this feature are not important for our discussion here, so we will omit them. For our purposes, the key point is that this class has a method called averageAmongMedians (see below), which calculates the average of the medians stored in an array. However, this method has no relation to the other methods in PlatformTestUtil. For example, it neither calls methods nor uses attributes of PlatformTestUtil. In summary, it is independent of the rest of the class.

class PlatformTestUtil {
  ...
  public static long averageAmongMedians(long[] time, int part) {
    int n = time.length;
    Arrays.sort(time);
    long total = 0;
    for (int i= n/2-n/part/2; i< n/2+n/part/2; i++) {
      total += time[i];
    }
    int middlePartLength = n/part;
    return middlePartLength == 0 ? 0 :total / middlePartLength;
  }
  ...
}

Therefore, the developers of this system decided to move averageAmongMedians to a class called ArrayUtil, whose purpose is to provide general functions for array manipulation. Thus, this class is more related to the service provided by averageAmongMedians. After the Move Method, the existing calls to averageAmongMedians should be updated, as shown in the following figure.

Updating static method calls after a Move Method. averageAmongMedians was moved from PlatformTestUtil to ArrayUtil.

As observed in this figure, this update was straightforward because averageAmongMedians is a static method. Therefore, only the class name needed to be changed at each call site.

However, in other cases, updating the calls after a Move Method refactoring is not straightforward. This occurs when there are no references to objects of the target class in the call sites. In such cases, one possible solution is to leave a simple implementation of the method in the original class, which forwards the calls to the new method’s class. This way, the existing calls do not need to be updated. An example is shown below. First, here is the original code:

class A {
  B b = new B();
  void f() { ... }
}

class B { ... }

class Client {
  A a = new A();
  void g() {
    a.f(); 
    ...
  }
}

And here is the code after the refactoring:

class A {
  B b = new B();

  void f() {
    b.f(); // forwards the call to B
  }
}

class B {    // f was moved from A to B
  void f() { ... }
}

class Client {
  A a = new A();
  void g() { 
    a.f();    // no changes required
    ...
  }
}

Note that f was moved from class A to class B. However, in class A, we left a version of the method that simply forwards the call to B. Thus, the Client class doesn’t require any updates.

When the Move Method refactorings occur within the same hierarchy of classes, they receive special names. For example, when we move a method from a subclass to a superclass, it is called a Pull Up Method. To illustrate, consider a scenario where the same method f is implemented in subclasses B1 and B2. In this case, we can avoid the duplicated implementation by pulling up each implementation to the superclass A, as shown below.

Pull Up Method

Conversely, when a method is moved down the class hierarchy, from a superclass to a subclass, we call it a Push Down Method refactoring. For instance, if a method f is implemented in superclass A, but it is relevant to only one subclass, say B1, we can push down its implementation to B1, as shown in the following figure.

Push Down Method

To conclude, refactoring operations can be performed in sequence. For example, consider class A with a method f:

class A {
  B b = new B();

  void f() {
    S1;
    S2;
  }
}

class B { ... }

First, from f, we extract a method g containing the statement S2:

class A {
  B b = new B();

  void g() { // extracted method
    S2;
  }

  void f() {
    S1;
    g();
  }
}

class B { ... }

Next, we move g to class B, as shown below:

class A {
  B b = new B();

  void f() {
    S1;
    b.g();
  }
}

class B {

  void g() { // moved method
    S2;
  }
}

9.2.4 Extract Class

This refactoring is recommended when a class A has many responsibilities and attributes. If some of these attributes are related, they can be grouped in another location. In other words, they can be extracted into a new class B. After the extraction, we should declare an attribute of type B in class A.

Example: Let’s consider the class Person shown below. In addition to other attributes, which have been omitted from the example, it stores the person’s phone numbers, including area code and number.

class Person {
  String areaCode;
  String phone;
  String alternativeAreaCode;
  String alternativePhone;
  ...
}

A new class Phone can be extracted from Person, as shown below. After the refactoring, Person receives two attributes of this new type.

class Phone { // extracted class
  String areaCode;
  String number; 
}


class Person {
  Phone phone;
  Phone alternativePhone;
  ...
}

A similar refactoring is called Extract Interface. For example, consider a library with classes LinkedList and ArrayList. From these classes, we can extract an interface List, with the methods common to both. Then, the library’s clients can apply the Prefer Interfaces to Concrete Classes design principle, which we studied in Chapter 5.

9.2.5 Renaming

There’s a provocative phrase attributed to Phil Karlton that says There are only two hard things in computer science: cache invalidation and naming things. As naming is hard, we often have to rename code elements, including variables, functions, methods, parameters, attributes, classes, etc. This may happen because the initial name of the element was not a good choice. Alternatively, the responsibility of this element may have changed over time, making its name outdated. In both cases, it is recommended to perform one of the most popular refactorings: renaming—that is, giving a more appropriate and meaningful name to a code element.

However, the most complex part of this refactoring is not changing the name of the element, but updating the parts of the code where it is referenced. For example, if a method f is renamed to g, all calls to f must be updated. In cases where f is widely used, it might be beneficial to extract its body to a new method, with the new name, and keep the old method as deprecated.

To illustrate, suppose we have the following method f:

void f () {
  // A
}

Next, we show the code after renaming f to g. Note that we have kept a deprecated version of f in the code, which simply calls g.

void g() {    // new name 
  // A
}


@Deprecated
void f() {    // old name
  g();        // forwards call to new method
}

Deprecation is a mechanism provided by programming languages to indicate that a code element is outdated and, therefore, should not be used anymore. When the compiler detects a piece of code using a deprecated element, it issues a warning. As illustrated in the previous example, the method f was not simply renamed to g. Instead, first, a method g was created with f’s original code. Then, f was deprecated, and its code was modified to only call g.

This deprecation-based strategy makes renaming safer, as it does not require updating all the calls to f at once. Instead, it gives clients time to adapt to the change and start using the new name. Indeed, refactorings—even the simplest ones, like renaming—should be performed in small steps or baby steps to avoid introducing bugs in previously working code.

9.2.6 Other Refactorings

The refactorings presented earlier are those with the highest potential for improving a system’s design, as they involve operations with a global scope, such as Move Method or Extract Class. However, there are refactorings with a narrower scope that improve, for example, the internal implementation of a single method. In this section, we briefly describe some of these refactorings.

Extract Variable is used to simplify expressions and make them easier to read and understand. Here’s an example:

x1 = (-b + sqrt(b*b - 4*a*c)) / (2*a);

This code can be refactored to:

delta = b*b - 4*a*c; // extracted variable
x1 = (-b + sqrt(delta)) / (2*a);

In this refactoring, a new variable delta was created and initialized with part of the larger expression. As a result, the expression, after the refactoring, became shorter and easier to understand.

Remove Flag is a refactoring that recommends using statements like break or return, instead of control variables (or flags). Here’s a code example:

boolean search(int x, int[]a) {
   boolean found = false;
   int i = 0;
   while ((i < a.length) && (!found)) {
     if (a[i] == x) 
        found = true;
     i++;
   }
   return found;
}

This code can be refactored as follows:

boolean search(int x, int[] a) {
  for (int i = 0; i < a.length; i++)
    if (a[i] == x)
       return true;
  return false;
}

After this refactoring, the function became shorter and cleaner, due to the use of a return statement to exit immediately once the searched value is found. As a result, the flag variable found has been removed.

There are also refactorings designed to simplify conditional statements. One of them is called Replace Conditional with Polymorphism. To understand it, consider a switch statement that returns the value of a student’s research grant, depending on the student’s type:

switch (student.getType()) {
  case "undergraduate": 
    grant = 750;
    break;
  case "master": 
    grant = 2000;
    break;
  case "phd": 
    grant = 3000;
    break;
}

In object-oriented languages, this statement can be refactored to a single line of code:

double grant = student.getGrant();

In the refactored version, the getType() method of Student is no longer necessary, so it can be removed. However, we must implement the getGrant() method in the Student’s subclasses—for example, UndergraduateStudent, MasterStudent, and PhDStudent. In the Student superclass, this method must be declared as abstract, requiring it to have only a signature, without a body.

Remove Dead Code recommends deleting methods, classes, variables, or attributes that are no longer used. For example, in the case of a method, there might be no calls to it anymore. In the case of a class, it might not be instantiated or inherited by any other classes. In the case of an attribute, it might not be used within the class body, nor in subclasses or in other classes. This may seem like an uncommon refactoring, but in large systems, maintained over years by dozens or even hundreds of developers, there is often a considerable amount of code that is no longer used.

9.3 Refactoring Practice

After describing various refactorings in the previous section, let’s now discuss how to adopt the practice of refactoring in real software projects.

First, successful refactoring depends on the existence of good tests, particularly unit tests. Without tests, making changes in a system becomes risky, especially when these changes do not add new functionalities or correct bugs, as is the case with refactorings. John Ousterhout emphasizes the importance of tests during refactoring activities (link, Section 19.3):

Tests, particularly unit tests, play an important role in design because they facilitate refactoring. Without a test suite, it’s dangerous to make major structural changes to a system. […] As a result, developers avoid refactoring in a system without good test suites; they try to minimize the number of code changes for each new feature or bug fix, which means that complexity accumulates and design mistakes don’t get corrected.

The second important issue concerns the timing of refactoring. There are two main approaches to refactoring: opportunistic and strategic.

Opportunistic refactorings are carried out during a programming task when we discover that a piece of code is not well implemented and can be improved. This can happen when correcting a bug or implementing a new feature. For instance, during these tasks, we might notice that a method’s name is unclear, a method is too long and difficult to understand, a conditional statement is too complex, or some code is no longer used. Whenever we detect such problems with a piece of code, we should refactor it as soon as possible. To provide some concrete guidance, suppose you will spend an hour implementing a new feature. In this case, it is desirable to invest part of that time—perhaps 20% or more—in refactorings. Kent Beck has an interesting quote about opportunistic refactorings:

For each desired change, make the change easy (warning: this may be hard), then make the easy change.

The underlying premise of this recommendation is that we may be struggling to implement a change precisely because the code is not prepared to accommodate it. Therefore, we should first take a step back and refactor the code to make the change easier. Once this is done, we can then take two steps forward and more easily implement the change we were assigned.

Most refactorings are opportunistic. However, it is also possible to have planned refactorings. These typically involve time-consuming and complex changes that cannot be accommodated within a typical development task. Instead, they should be carried out in scheduled and dedicated sessions. For instance, such refactorings might involve breaking a package into two or more subpackages. Another example is when teams have neglected the practice of refactoring for a long time. In such cases, with numerous pending refactorings, it’s better to schedule them for a specific time period.

9.4 Automated Refactorings

Today, most IDEs provide support to automate refactorings in the following manner: developers select the block of code they intend to refactor and choose the refactoring operation they wish to perform. The IDE then performs this operation automatically. The following figures illustrate an automated renaming via an IDE. First, in this example, the developer selects the m1 method:

Next, the developer selects the Refactor and Rename options:

The IDE then prompts for the new method name (see the following figure). In the same dialog box, the developer can request to update the references to this method.

Finally, the IDE executes the refactoring:

As we can see, the method has been renamed to m11. Furthermore, the calls in m3 and m4 have been updated to use this new name.

Although it’s referred to as an automated refactoring, the example shows that developers are still responsible for specifying the code they want to refactor and the refactoring operation they intend to perform. Moreover, developers must provide specific information about the refactoring, such as the new name in the case of renaming. Only after these inputs are provided does the refactoring process become automated.

Before applying the refactoring, the IDE checks whether its preconditions are satisfied, i.e., whether the refactoring will not result in a compilation error or change the program’s behavior. In the previous example, if the user attempts to rename m1 to m2, the IDE would inform them that this refactoring cannot be performed, as the class already has a method named m2.

9.4.1 Checking Refactoring Preconditions

The task of checking the preconditions of a refactoring, in most cases, is not simple. To illustrate the complexity of this task, let’s examine a small Java program proposed by Friedrich Steimann and Andreas Thies (link). The program below has two classes, A and B, implemented in separate files but belonging to the same package P1. Therefore, the call to m("abc") in the first file executes the m(String) method in class B.

// File A.java
package P1;

public class A {
  void n() { 
    new B().m("abc"); // executes m(String) from B
  }
}
// File B.java
package P1;

public class B {
  public void m(Object o) {...}
  void m(String s) {...}
}

However, suppose class B is moved to a new package; for instance, to P2:

// File B.java
package P2;   // new package for B

public class B {
  public void m(Object o) {...}
  void m(String s) {...}
} 

Although it might seem harmless, this Move Class changes the behavior of the program. In the new version, the call to m("abc") results in the execution of m(Object) instead of m(String). This occurs because class B no longer resides in the same package as A. Thus, m(String) is no longer visible to A as it is not a public method. For clarity, a public method of a public class, like m(Object), can be called from any part of the code. However, methods without an explicit visibility modifier, like m(String), can only be called by code within the same package.

In summary, this particular Move Class operation cannot be considered a refactoring, as it does not preserve the program’s behavior. If performed with the assistance of an IDE, the IDE should detect this problem and warn the user that the operation cannot be performed because it changes the program’s behavior.

9.5 Code Smells

Code Smells—also known as bad smells—are indicators of low-quality code, that is, code that is difficult to maintain, understand, modify, or test. In essence, code that does not smell good and therefore might need refactoring. However, it’s important to note that the term indicators suggests that not every smell requires an immediate refactor. This decision depends on various factors, such as the importance of the code containing the smell and how often it needs maintenance. With this point clarified, we will present some examples of code smells in the rest of this section.

9.5.1 Duplicated Code

Code duplication is a common code smell and one of those with the highest potential to harm the evolution of a system. Duplicated code increases maintenance effort, as changes need to be applied to more than one part of the code. Consequently, there is a risk of changing one part and forgetting about others. Duplicated code also makes the codebase more complex, as data and commands that could be encapsulated in methods or classes are instead scattered throughout the system.

To eliminate code duplication, the following refactorings can be used: Extract Method (recommended when the duplicated code is within two or more methods), Extract Class (recommended when the duplicated code refers to attributes and methods that appear in more than one class), and Pull Up Method (recommended when the duplicated code is a method present in two or more subclasses).

Blocks of duplicated code are called clones. However, different criteria can be used to define when two code blocks A and B are, in fact, duplicated. These criteria give rise to four types of clones, as described below:

Example: To illustrate these types of clones, we will use the following function:

int factorial(int n) {
  int fat = 1;
  for (i = 1; i <= n; i++)
    fat = fat * i;
  return fat;
}

Next, we show four clones of this function.

Type 1 clone: adds a comment and removes spaces between operators.

int factorial(int n) {
  int fat=1;
  for (int i=1; i<=n; i++) 
    fat=fat*i;
  return fat; // returns factorial
}

Type 2 clone: renames some variables.

int factorial(int n) {
  int f = 1;
  for (int j = 1; j <= n; j++) 
    f = f * j;
  return f; 
}

Type 3 clone: adds a statement that prints the value of the factorial.

int factorial(int n) {
  int fat = 1;
  for (int j = 1; j <= n; j++)
    fat = fat * j;
  System.out.println(fat); // new statement
  return fat;
}

Type 4 clone: implements a recursive version of the function.

int factorial(int n) {
  if (n == 0)
     return 1;
  else return n * factorial(n-1);
}

In these cases, we don’t need multiple factorial functions. One implementation should be chosen and the others should be removed.

Real World: In 2013, Aiko Yamashita and Leon Moonen, two researchers from a research lab in Norway, published the results of an exploratory study on code smells involving 85 developers (link). When asked about the smells they were usually most concerned with, the most common answer was Duplicated Code, with nearly 20 points on the scale used to rank the responses. The second most common smell, with half the points, was Long Method, which we will discuss in the following section.

9.5.2 Long Method

Ideally, methods should be small, with self-explanatory names and few lines of code. Long Methods are considered a code smell, as they make the code harder to understand and maintain. When we find a long method, we should consider using an Extract Method to break it into smaller methods. However, there’s no maximum number of lines of code that can arbitrarily be used to identify long methods, because this depends on the programming language, the relevance of the method, the system’s domain, and other factors. Nevertheless, nowadays, there’s a tendency to write small methods, with fewer than, for instance, 20 lines of code.

9.5.3 Large Class

Like methods, classes shouldn’t assume multiple responsibilities or provide services that aren’t cohesive. Hence, Large Classes are considered a smell because, just like long methods, they make the code harder to understand and maintain. These classes are also more challenging to reuse in another package or system. Large classes are characterized by a large number of attributes with low cohesion among them. The solution to this smell involves using an Extract Class to derive a smaller class B from a larger class A. Afterward, class A gets an attribute of type B.

In-Depth: When a class grows so much that it starts to monopolize most of the system’s intelligence, it’s called a God Class or a Blob. Classes with generic names like Manager, System, or Subsystem tend to be instances of this smell.

9.5.4 Feature Envy

This smell corresponds to a method that seems to envy the data and methods of another class. In other words, it accesses more attributes and methods of class B than of its current class A. Therefore, we may consider using a Move Method to migrate it to the envied class.

The following fireAreaInvalidated2 method is an example of Feature Envy. It has three method calls, all having the same abt object of type AbstractTool as their target. Moreover, it accesses no attributes or calls any methods of its current class. Therefore, moving this method to AbstractTool should be considered.

public class DrawingEditorProxy 
             extends AbstractBean implements DrawingEditor {
  ...
  void fireAreaInvalidated2(AbstractTool abt , Double r ) {
    Point p1 = abt.getView().drawingToView(...);
    Point p2 = abt.getView().drawingToView(...);
    Rectangle rect = new Rectangle(p1.x, p1.y, p2.x - p1.x, p2.y - p1.y);
    abt.fireAreaInvalidated (rect);
  }
  ...
}

9.5.5 Long Parameters List

In addition to being small, methods should, whenever possible, have few parameters. That is, a Long Parameters List is a smell that can be eliminated in two main ways. First, we should check whether one of the parameters can be obtained directly by the called method, as shown below:

p2 = p1.f();
g(p1, p2);

In this case, p2 is unnecessary, as it can be obtained at the beginning of g:

void g(p1) {
  p2 = p1.f(); ...
}

Another possibility is to create a class to group some of the parameters. For example, consider the following method:

void f(Date start, Date end) {
  ...
}

In this scenario, we can create a DateRange class to represent a range of dates:

class DateRange {
   Date start;
   Date end;
}

void f(DateRange range) {
  ...
}

9.5.6 Global Variables

As we studied in Chapter 5, global variables should be avoided, as they result in a poor type of coupling. For this reason, they are considered a code smell. Essentially, global variables make it more difficult to understand a function without looking at other parts of the system. To understand this issue, consider the following function:

void f(...) {
  // computes a certain value x
  return x + g; // where g is a global variable
}

Just from analyzing this code, can the reader tell what f returns? The answer is no, as we need to know g’s value. However, since g is a global variable, its value can be changed anywhere in the program, which can easily introduce bugs in this function because a single, distant, and unrelated line of code can influence f’s result. All it needs to do is to change the value of g.

It’s also worth noting that in languages like Java, static attributes work exactly like global variables. Therefore, they also represent a code smell.

9.5.7 Primitive Obsession

This code smell occurs when primitive types (that is, int, float, String, etc.) are used instead of specific classes. For example, suppose we need to declare a variable to store a ZIP code for an address. In a rush to quickly use the variable, we might declare it as a String, instead of creating a dedicated class—such as ZIPCode—for this purpose. The main advantage of using a dedicated class is that it can provide methods to manipulate and validate the stored values. For example, the class constructor can check if the provided ZIP code is valid before initializing the object. By using a dedicated class, we encapsulate this responsibility and, consequently, free the clients from this concern.

In short, we should not be obsessed with primitive types. Instead, we should consider creating classes that encapsulate primitive values and provide operations to handle them. In the next section, we’ll complement this recommendation by suggesting that such objects, when possible, should also be immutable.

9.5.8 Mutable Objects

In the second edition of his book on refactoring, Fowler considers mutable objects a code smell. A mutable object is one whose state can be modified after creation. On the other hand, an immutable object, once created, cannot be changed. To facilitate the creation of immutable objects, classes should declare all their attributes as private and final (a final attribute is read-only). The class itself should also be declared final to prohibit the creation of subclasses. Consequently, when we need to modify an immutable object, the only alternative is to create a new instance with the desired state. For example, String objects in Java are immutable, as illustrated in the following example.

class Main {
  public static void main(String[] args) {
    String s1 = "Hello World";
    String s2 = s1.toUpperCase();
    System.out.println(s1);
    System.out.println(s2);
  }
}

This program prints Hello World and then HELLO WORLD. The reason is that the toUpperCase method does not change the s1 string, but instead returns a new string with the uppercase version of the original one.

Whenever possible, we should create immutable objects, as they can be shared freely and safely with other functions. For example, in Java, you can pass a string as a parameter to a function f and be sure that f will not change its content. This would not be the case if strings were mutable, as it would be possible for f to change the string. An additional advantage is that immutable objects are thread-safe, meaning there is no need to synchronize thread access to their methods. The reason is simple: the classic problems of concurrent systems, such as race conditions, occur only when multiple threads can change the state of an object. If the state is immutable, such problems disappear by design.

However, we should understand this code smell within the context of the programming language used in a project. For example, in functional languages, objects are immutable by definition, meaning this code smell will never appear. On the other hand, in imperative languages, it is common to have a certain number of mutable objects. For this reason, when using these languages, we should try to minimize the number of such objects, without, however, expecting to completely eliminate them. As a guideline, we should make simple and small objects immutable, such as those representing ZIPCode, Currency, Address, Date, Time, Phone, Color, and Email, among others.

To illustrate, we show the implementation of an immutable Date type:

record Date(int day, int month, int year) {
  public Date {
    if (day < 1 || day > 31) {
      throw new IllegalArgumentException("Invalid day: " + day);
    }
    if (month < 1 || month > 12) {
      throw new IllegalArgumentException("Invalid month: " + month);
    }   
  }
}

This implementation uses the concept of records, introduced in Java 16, to facilitate the implementation of immutable data types in the language. A record is a restricted form of a class. For instance, a record has a name (Date in this case) and a set of fields (day, month, and year, in our example), which are final by definition. Furthermore, records automatically provide default methods, such as toString(), equals(), and hashCode().

9.5.9 Data Classes

These classes only have attributes and no methods, or at most, they have getters and setters. However, as is often the case with code smells, we should not consider every Data Class a major concern. Instead, the important thing is to analyze the code and try to move behavior to these classes. Specifically, whenever it makes sense, we should add methods to these classes implementing operations that are already being performed elsewhere.

9.5.10 Comments

It may seem strange to see comments included in a list of code smells. For instance, in introductory programming courses, students are usually encouraged to comment on all the code they produce, with the aim of learning the importance of code documentation. In the book Elements of Programming Style, Brian Kernighan—one of the creators of the Unix operating system and the C programming language—and P. J. Plauger offer advice that clarifies the view of comments as smells. They recommend the following:

Don’t comment bad code, rewrite it.

Thus, the idea is that comments should not be used to explain bad code. Instead, we should refactor the code, thereby improving readability. After doing that, there’s a good chance that the comments will become redundant. An example of this recommendation can be seen in the context of Long Methods, such as:

void f() {
  // task1
  ... 
  // task2
  ... 
  // taskn
  ...
}

If we apply Extract Method to the commented code, we’ll obtain the following improved version:

void task1() { ... }
void task2() { ... }
void taskn() { ... }

void f() { 
  task1();
  task2();
  ...
  taskn();
}

After this refactoring, the comments are no longer necessary, as the method names now clearly convey their purpose.

In-Depth: Technical debt is a term coined by Ward Cunningham in 1992 to designate technical problems that can hinder the maintenance and evolution of a system. These problems include, among others, lack of tests, architectural issues (for example, systems that are a Big Ball of Mud), systems with many code smells, and systems without any documentation at all. Cunningham’s intention was to create a term that could be understood by managers and people without knowledge of software engineering principles and practices. Hence, he chose the term debt to emphasize that these issues, if not resolved, will require the payment of interest. This interest usually manifests as inflexible and difficult-to-maintain systems, where bug fixes and the implementation of new features become increasingly time-consuming and risky.

To illustrate the concept, suppose that adding a new feature F1 to a program would require three days. However, if there were no technical debt, F1 could be implemented in just two days. This one-day difference represents the interest charged by the technical debt present in our program. One option is to pay off the principal of this debt, that is, to completely remove the technical debt. But that might take, for instance, four days. In other words, if we plan to evolve the program with F1 only, there’s no advantage. However, suppose that over the next few months we will need to implement more features, such as F2, F3, F4, etc. In this scenario, eliminating the principal of the technical debt might be worthwhile.

Bibliography

Martin Fowler. Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1st edition, 2000.

Martin Fowler. Refactoring: Improving the Design of Existing Code, Addison-Wesley, 2nd edition, 2018.

Danilo Silva, Nikolaos Tsantalis, Marco Tulio Valente. Why We Refactor? Confessions of GitHub Contributors. ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), p. 858-870, 2016.

Exercises

1. Mark the FALSE alternative:

(a) Refactorings improve the design of a software system.

(b) Refactorings make the code of a system easier to understand.

(c) Refactorings facilitate the location and correction of future bugs.

(d) Refactorings speed up the implementation of new features.

(e) Refactorings always improve a system’s runtime performance.

2. The following graph displays the number of features implemented in two similar systems (A and B), developed by comparable teams using the same technologies. In which system do you believe refactorings were systematically performed? Justify your answer.

3. Describe the differences between opportunistic and planned refactorings. Which of these types of refactoring do you think is more common? Justify your answer.

4. Provide the names of refactorings A and B that, if executed in sequence, do not change the system’s code. In other words, refactoring B reverts the transformations made by refactoring A.

5. In these examples, extract the lines commented with the word extract to a method named g.

class A {
  void f() {
    int x = 10;
    x++;      
    print x;   // extract
  }
}
class A {
  void f() {
    int x = 10;
    x++;     // extract
    print x; // extract
  }
}
class A {
  void f() {
    int x = 10;
    x++;     // extract
    print x; // extract
    int y = x+1;
    ...
  }
}
class A {
  void f() {
    int x = 10;
    int y;     // extract
    y = h()*2; // extract
    print y;   // extract
    int z = y+1;
    ...
  }
}

6. The following function calculates the n-th term of the Fibonacci sequence. The first term is 0, the second term is 1, and thereafter, the n-th term is the sum of the preceding two terms.

int fib(int n) {
  if (n == 1)
     return 0;
  if (n == 2)
     return 1;
  return fib(n-1) + fib(n-2);
}

Describe Type 1, 2, 3, and 4 clones for this function. It is not necessary to implement the clones, but be precise in your answer when indicating the differences between each clone and the original code.

7. Consider the following class that stores a specific value in dollars.

class Currency {
  ...
  private double value = 0.0;
  void add(Currency m) {
    this.value = this.value + m.getValue();
  } 
  ...
}

(a) Why are the objects of this class mutable?

(b) Reimplement the class to ensure that its objects are immutable.

8. As discussed in Section 9.5, comments that are used to explain bad code are a code smell. In these cases, the ideal solution is to clean the code, and then remove the comments. Below, we show another category of comments that may be deleted. Explain why these comments are unnecessary.

// Student class
class Student {
   
   // Student's enrollment number
   int enrollment;
   
   // Student's date of birth
   Date dateOfBirth;
   
   // Student's address
   Address address;
   
   // Default constructor of the Student class
   Student() {
     ...
   } 
   ...  
}

9. Use an IDE to perform a simple refactoring on one of your programs. For example, perform a method renaming. What are the advantages of performing refactorings with the support of IDEs?

10. Use a Java IDE to perform the Move Class refactoring discussed in Section 9.4.1. Did the IDE report any errors? If so, describe these errors.