Book cover
All rights reserved. Version for personal use only.
This web version is subjected to minor edits. To report errors or typos, use this form.

Home | Dark Mode | Cite

Software Engineering: A Modern Approach

Marco Tulio Valente

9 Refactoring

I’m not a great programmer; I’m just a good programmer with great habits. – Kent Beck

This chapter begins with an introduction to refactorings, which are modifications made to a program to make it easier to understand and evolve. In Section 9.2, we describe a series of refactoring operations. In many cases, we use code examples, some of which are derived from real refactorings performed on open-source projects. Next, in Section 9.3, we discuss some aspects of the practice of refactoring, including the importance of having a robust unit test suite. Section 9.4 explores the support provided by IDEs for automated refactoring. Finally, Section 9.5 describes a list of code smells, which are indicators that a particular code structure isn’t smelling good and therefore, could be the subject of a refactoring.

9.1 Introduction

In the previous chapter, we studied that software needs to be tested, like any engineering product. The same recommendation applies to maintenance activities. That is, software also needs maintenance. In fact, in Chapter 1, we mentioned the various types of maintenance that can be performed on software systems. For example, when a bug is detected, we have to apply corrective maintenance. When users or the Product Owner request a new feature, we have to conduct evolutionary maintenance. When a business rule or some technology used by the system changes, we must set aside time for performing an adaptive maintenance.

In addition, software systems also age, as happens with living beings. In the early 1970s, Meir Lehman, then working at IBM, elaborated on this phenomenon, and as a result, enunciated a set of empirical laws about aging, internal quality, and the evolution of software systems, known as the Software Evolution Laws or simply Lehman’s Laws. The first two laws stated by Lehman were:

  • A system must be continuously maintained to adapt to its environment. This process must continue until the point where it becomes more advantageous to replace it with a new system.

  • As a system undergoes maintenance, its internal complexity increases unless work is done to maintain or prevent this phenomenon.

The first law justifies the need for adaptive and evolutionary maintenance. It also mentions that systems can die, meaning there may come a time when it is more advantageous to replace a system with a new one. The second Lehman’s Law states that maintenance makes the code and the internal structure of a system more complex and harder to maintain in the future. In other words, there is a natural deterioration of a system’s internal quality as it undergoes maintenance and evolution. However, this second law also makes a reservation: unless work is done to maintain or prevent this phenomenon. Today, this work is called refactoring.

Refactorings are code transformations that improve a system’s maintainability without affecting its execution. To explain this definition, let’s divide it into three parts. First, when the definition mentions code transformations, it is referring to modifications in the code, like splitting a function into two, renaming a variable, moving a function to another class, extracting an interface from a class, etc. Next, the definition mentions the goal of such transformations: improve the maintainability of the system, that is, improve its modularity, design, or architecture, enhance its testability, make the code more readable, easier to understand and modify, etc. Finally, a restriction is set: it is not worth improving maintainability but changing the program results. In other words, refactoring must deliver the system working exactly as it was before the transformation. Another way to express this is to assert that refactorings must preserve the system’s behavior or semantics.

However, in the 70s and 80s when Lehman’s Laws were formulated, the term refactoring was not yet used. One of the first uses of this term occurred in 1992, in a doctoral thesis defended by William Opdyke at the University of Illinois (link). Next, in 1999, refactoring, already with this name, was included among the programming practices advocated by Extreme Programming. In the first edition of the XP book, it was recommended that developers should perform refactorings with the goal of restructuring their systems to remove duplication, improve communication with other developers, simplify or make the design more flexible.

In 2000, Martin Fowler released the first edition of a book dedicated to refactoring, which was very successful and contributed to popularizing this practice. One of the reasons for this success was that the book presents a catalog with dozens of refactorings. In a way that resembles a catalog of design patterns (as we studied in Chapter 6), the presentation of the refactorings starts by naming them, which helped create a vocabulary about refactoring. In the book, Fowler also presents the mechanics of each refactoring, includes code examples, and discusses the benefits and disadvantages of the refactorings.

In his book, Fowler also mentions Kent Beck’s phrase that opens this chapter and emphasizes the importance of following good programming habits, which are essential to maintain the program’s health, ensuring that it continues to provide value over the years. Therefore, developers should not only carry out corrective, adaptive, and evolutionary maintenance. It is also important to take care of the system’s maintainability by performing frequent refactorings.

Notice: The term refactoring has become common in Software Engineering. Therefore, sometimes it is used to indicate improvements in other non-functional requirements, besides maintainability. For example, we often hear developers mention they will refactor their code to improve performance, introduce concurrency, improve the usability of the interface, etc. However, in this book, we use the term restricted to its original definition, that is, only to refer to modifications that improve the maintainability of the source code.

9.2 Examples of Refactorings

In this section, we will present some widely used refactorings. For each one, we will comment on the name, motivation, and implementation mechanics. Furthermore, we illustrate the presentation with examples of real refactorings performed by open-source developers.

9.2.1 Extract Method

Extract Method is one of the most important refactorings as it aims to extract a piece of code from method f and move it to a new method g. Subsequently, f will include a call to g. The following code illustrates this refactoring. First, the code before the refactoring:

void f() {
  ... // A
  ... // B 
  ... // C
}

And now the code after the extraction of method g:

void g() {  // extracted method
  ... // B
}

void f () {
  ...  // A
  g();
  ...  // C
}

There are also variations in the method extraction mechanics. For instance, it’s possible to extract multiple methods, denoted as g1, g2, …, gn, from a single method f all at once. Additionally, we can extract the same code g from multiple methods, denoted as f1, f2, …, fn. In this case, the extraction is used to eliminate code duplication since the code in g is repeated in several methods.

To perform a Extract Method, we may have to pass parameters to the new method. This is necessary, for example, if the extracted method needs to access local variables from the original one. Additionally, the extracted method should return values if they are used later by the original method. Finally, if local variables are only used in the extracted code, we should also extract their declarations.

Extract Method is often referred to as the Swiss Army knife of refactorings due to its versatility and wide range of applications. For instance, it can be employed to break down a large method into smaller ones, each serving a specific goal that is reflected in its name. This approach also enhances the comprehensibility of the original method, as its body after the refactoring consists only of a sequence of calls. An example is provided below, and other applications of Extract Method are discussed in a following section.

Example: Here, we provide a real example of Extract Method from an Android system. This system has an onCreate method, which uses SQL commands to create the database tables it manipulates. The code for the initial version of onCreate—after a few edits, simplifications, and removal of comments—is presented below. The original source code exceeds 200 lines.

void onCreate(SQLiteDatabase database) {// before extraction
  // creates table 1
  database.execSQL("CREATE TABLE " +
            CELL_SIGNAL_TABLE + " (" + COLUMN_ID +
            " INTEGER PRIMARY KEY AUTOINCREMENT, " + ...
  database.execSQL("CREATE INDEX cellID_index ON " + ...);
  database.execSQL("CREATE INDEX cellID_timestamp ON " +...);

  // creates table 2
  String SMS_DATABASE_CREATE = "CREATE TABLE " + 
            SILENT_SMS_TABLE + " (" + COLUMN_ID +
            " INTEGER PRIMARY KEY AUTOINCREMENT, " + ...
  database.execSQL(SMS_DATABASE_CREATE);
  String ZeroSMS = "INSERT INTO " + SILENT_SMS_TABLE + 
            " (Address,Display,Class,ServiceCtr,Message) " +
            "VALUES ('"+ ...
  database.execSQL(ZeroSMS);

  // creates table 3
  String LOC_DATABASE_CREATE = "CREATE TABLE " +      
            LOCATION_TABLE + " (" + COLUMN_ID +
            " INTEGER PRIMARY KEY AUTOINCREMENT, " + ...
  database.execSQL(LOC_DATABASE_CREATE);
  // more 200 lines, creating other tables
}

To enhance the clarity of this method, a developer decided to perform seven Extract Method operations. Each extracted method became responsible for creating one of the tables. By examining the names of these methods, we can easily infer the tables they create. After these extractions, the onCreate method only calls the extracted methods, as shown below. The method’s size has been reduced from over 200 lines to just seven lines. It’s also worth noting that the extracted methods also have a parameter representing the database where the tables should be created.

public void onCreate(SQLiteDatabase database) { 
  createCellSignalTable(database);
  createSilentSmsTable(database);
  createLocationTable(database);
  createCellTable(database);
  createOpenCellIDTable(database);
  createDefaultMCCTable(database);
  createEventLogTable(database);
 }

Extract Method Motivations

In 2016, along with Danilo Silva and Prof. Nikolaos Tsantalis from Concordia University in Montreal, we conducted a study with GitHub developers to discover their motivations for performing refactorings (link). In the study, we identified 11 distinct motivations for method extractions, as shown in the following table.

Motivations for Extract Method Occurrences
Extracting a method that will be called by other methods 43
Introducing an alternative signature for an existing method 25
Breaking a method into smaller parts to improve understanding 21
Extracting a method to remove code duplication 15
Extracting a method to facilitate the implementation of a new feature or to fix a bug 14
Extracting a method with a better name or fewer parameters 6
Extracting a method to test its code 6
Extracting a method that will be overridden by subclasses 4
Extracting a method to allow recursive calls 2
Extracting a method that will work as a factory 1
Extracting a method that will run on a thread 1

Analyzing this table, we can see that the primary motivation is to extract code for reuse. Developers often come across a piece of code they need, only to find it is already implemented within a method f. To make this code accessible, they perform a Extract Method, creating a new method g. This allows them to call g in the code they are working on. Here’s an example of how one developer reported this motivation for extracting a method:

These refactorings were made because of code reusability. I needed to use the same code in a new method. I always try to reuse code, because when there’s a lot of code redundancy it gets overwhelmingly more complicated to work with the code in future, because when something change in code that is duplicate somewhere, it usually needs to be changed also there.

As seen in the previous table, the second most frequent motivation is introducing an alternative signature for a method. An example is shown below, involving a method that logs a string to a file. We first show the code before the refactoring:

void log(String msg) {
  // saves msg to file
}

The developer decided to provide an alternative version of this method with a boolean parameter, indicating whether the string should also be printed on the console. Here is the code after extracting this second log method:

void log(String msg, boolean console) {
  if (console)
     System.out.println(msg);
  // saves msg to a file
}

void log(String msg) {
  log(msg, false);
}

The reader should also observe that, technically, this transformation is an Extract Method. Initially, the log(String) code was extracted to log(String, boolean), creating a new method. Subsequently, minor adjustments were made to the extracted code, including the addition of an if statement. Such adjustments are common and do not disqualify the transformation as a method extraction.

9.2.2 Inline Method

This refactoring operates in the opposite direction of method extraction. Consider a small method with only one or two lines of code, which are rarely called. Therefore, the benefits offered by this method, both in terms of reuse and organization of the code, are minimal. Consequently, it can be eliminated from the system, and its body can be incorporated into the call sites. It’s important to note that Inline Method is a less common and less significant operation compared to Extract Method.

Example: The following code shows an example of Inline Method. First, here is the code before the inline. We can see that the writeContentToFile method has a single line of code and is called only once by the write method.

private void writeContentToFile(final byte[] revision) {
  getVirtualFile().setBinaryContent(revision);
}
 
private void write(byte[] revision) {
  VirtualFile virtualFile = getVirtualFile();
  ...
  if (document == null) {
     writeContentToFile(revision);
  }
   ...
 }

The developers of this code decided to delete the writeContentToFile method and expand its body at the single call site. The code after the refactoring is shown below.

private void write(byte[] revision) {
  VirtualFile virtualFile = getVirtualFile();
  ...
  if (document == null) {
     virtualFile.setBinaryContent(revision); // after inline
  }
  ...
}

9.2.3 Move Method

It is not uncommon to find a method implemented in the wrong class. That is, even though a method f is located in a class A, it might use more services from class B. For instance, it may have more dependencies on elements of B than on elements of its original class. In these cases, moving f to B should be considered. This refactoring can improve the cohesion of A, reduce the coupling between A and B, and ultimately make both classes easier to understand and modify.

Due to its characteristics, Move Method is one of the refactorings with the highest potential to improve modularity. As its effects are not restricted to a single class, Move Method can have a positive impact on the system’s architecture, ensuring that methods are placed in the appropriate classes, both from a functional and architectural viewpoint.

Example: Suppose the code of an IDE. In this code, we have a class called PlatformTestUtil, which is responsible for a given feature provided by the IDE. In fact, details about this feature are not important for our point here and thus we will omit them. For us, the key point is that this class has a method called averageAmongMedians (see below), which calculates the average of the medians stored in an array. However, this method has no relation with the other methods in PlatformTestUtil. For example, it does not call methods or use attributes of PlatformTestUtil. It is, in summary, independent of the rest of the class.

class PlatformTestUtil {
  ...
  public static long averageAmongMedians(long[] time, 
                                         int part) {
    int n = time.length;
    Arrays.sort(time);
    long total = 0;
    for (int i= n/2-n/part/2; i< n/2+n/part/2; i++) {
      total += time[i];
    }
    int middlePartLength = n/part;
    return middlePartLength == 0 ? 0:total/middlePartLength;
  }
  ...
}

Thus, the developers of this IDE decided to move averageAmongMedians to a class called ArrayUtil, whose purpose is exactly to provide general functions for array manipulation. Thus, it is a class more related to the service provided by averageAmongMedians.

After the Move Method, the existing calls to averageAmongMedians should be updated, as shown in the figure.

Updating static method calls after a Move Method. averageAmongMedians was moved from PlatformTestUtil to ArrayUtil.

However, as the reader can observe in this figure, this update was not difficult because averageAmongMedians is a static method. Therefore, only the class name was changed at each call site.

In other cases, however, it’s not simple to update the calls after a Move Method. This happens when there are no references to objects of the target class in the call sites. In such cases, a possible solution is to leave a simple implementation of the method in the original class, which just forwards the calls to the new method’s class. This way, the existing calls do not need to be updated. An example is shown below. First, the original code:

class A {
  B b = new B();
  void f { ... }
}

class B { ... }

class Client {
  A a = new A();
  void g() {
    a.f(); 
    ...
  }
}

And now the code after the refactoring:

class A {
  B b = new B();
  void f {
    b.f(); // just forwards the call to B
  }
}

class B {    // f was moved from A to B
  void f { ... }
}

class Client {
  A a = new A();
  void g() { 
    a.f();    // no changes
    ...
  }
}

Note that f was moved from class A to class B. However, in A, we left a version of the method that simply forwards the call to B. That’s why the Client class didn’t need updates.

When they occur in the same hierarchy of classes, Move Method receives special names. For example, when we move a method from a subclass to a superclass, it is called Pull Up Method. To illustrate, suppose the same f method is implemented in subclasses B1 and B2. Thus, this duplicated implementation can be avoided by pulling up each implementation to the superclass A, as shown next.

Pull Up Method

Conversely, when a method is moved down the classes hierarchy, that is, from a superclass to a subclass, we say a Push Down Method was performed. For instance, despite being implemented in superclass A, a method f might be of interest to only one subclass, let’s call it B1. Therefore, we can push down its implementation to B1, as shown next.

Push Down Method

To conclude, refactoring operations can be done in sequence. For example, consider the following class A with a method f:

class A {
  B b = new B();

  void f(){
    S1;
    S2;
  }

}

class B { ... }

First, from f, we’ll extract a method g with the statement S2:

class A {
  B b = new B();

  void g() { // extracted method
    S2;
  }

  void f(){
    S1;
    g();
  }
}

class B { ... }

Next, we’ll move g to class B, as illustrated next:

class A {
  B b = new B();

  void f(){
    S1;
    b.g();
  }
}

class B {

  void g() { // moved method
    S2;
  }

}

9.2.4 Extract Class

This refactoring is recommended when a program has a class A with many responsibilities and attributes. Some of these attributes are related and can be defined together, but in another location. In other words, they can be extracted into a new class B. After the extraction, we should declare in A an attribute of type B.

Example: Let’s consider the class Person shown below. In addition to other attributes, which have been omitted from the example, it stores the person’s phones, including area code and number.

class Person {
  String areaCode;
  String phone;
  String alternativeAreaCode;
  String alternativePhone;
  ...
}

Thus, we can extract from Person a new class, called Phone, as shown below. After the refactoring, Person gains two attributes of this new type.

class Phone { // extracted class
  String areaCode;
  String number; 
}

class Person {
  Phone phone;
  Phone alternativePhone;
  ...
}

A similar refactoring is called Extract Interface. For example, suppose a library that implements classes LinkedList and ArrayList. From these classes, we can extract an interface List, with the methods common to them. Thus, the library’s clients can start practicing the Prefer Interfaces to Concrete Classes design principle, which we studied in Chapter 5.

9.2.5 Renaming

There’s a provocative phrase, attributed to Phil Karlton, that says there are only two hard things in Computer Science: cache invalidation and naming things. Thus, as naming is hard, we often have to rename a code element, be it a variable, function, method, parameter, attribute, class, etc. This may happen because the initial name of the element was not a good choice. Another reason is that the responsibility of this element may have changed over time, and thus its name became outdated. In both cases, it is recommended to perform one of the most popular refactorings: renaming. That is, giving a more appropriate and meaningful name to a code element.

When this refactoring is applied, the most complex part is not changing the name of the element, but updating the points in the code where it is referenced. For example, if a method f is renamed to g, all calls of f must be updated. In fact, if f is widely used, it may be interesting to extract it to a new method, with the new name, and keep the old method deprecated.

To illustrate, suppose the following method f:

void f () {
  // A
}

Next, we show the code after renaming f to g. Note that we have kept a deprecated version of f in the code, which just calls g.

void g() {    // new name 
  // A
}

@deprecated
void f() {    // old name
  g();        // forwards call to new name
}

Deprecation is a mechanism offered by programming languages to indicate that a code element is outdated and, therefore, should not be used anymore. When the compiler finds out that a piece of code is using a deprecated element, it issues a warning. As we commented in the previous example, the method f was not simply renamed to g. Instead, first, a method g was created with f’s original code. Then, f was deprecated, and its code modified to only call g.

This deprecation-based strategy makes renaming safer, as it does not require updating all calls of f at once. Thus, it gives time for clients to adapt to the change and start using the new name. In fact, refactorings—even the simplest ones, like a renaming—should be performed in small or baby steps to make sure they won’t introduce bugs in a code that was working before.

9.2.6 Other Refactorings

The refactorings presented earlier are the ones with the higher potential for improving a system’s design, as they involve operations with a global scope, such as Move Method or Extract Class. However, there are refactorings with a narrow scope that improve, for example, the internal implementation of a single method. Next, we briefly describe some of these refactorings.

Extract Variable is used to simplify expressions and make them easier to read and understand. Here’s an example:

x1 = (-b + sqrt(b*b-4*a*c)) / (2*a);

This code can be refactored to:

delta = b*b-4*a*c; // extracted variable
x1 = (-b + sqrt(delta)) / (2*a);

Note that a new variable delta was created and initialized with part of a larger expression. As a result, this expression, after the refactoring, became shorter and easier to understand.

Remove Flag is a refactoring that suggests using commands like break or return, instead of control variables (or flags). Here’s a code example:

boolean search(int x, int[]a) {
   boolean found = false;
   i = 0;
   while (i < a.length) && (!found) {
     if (a[i] == x);
        found = true;
     i++;
   }
   return found;
}

This code can be refactored as follows:

boolean search(int x, int[]a) {
  for (i = 0; i < a.length; i++)
    if (a[i] == x)
       return true;
  return false;
}

After this refactoring, the function became smaller and with a clearer logic, thanks to using a return command to exit immediately once the searched value has been found. The flag variable found was then removed.

There are also refactorings to simplify conditional commands. One of them is called Replace Conditional with Polymorphism. To understand it, consider a switch command that returns the value of a student’s research grant, depending on the student’s type:

switch (student.type) {
  case "undergraduate": 
    grant = 750;
    break;
  case "master": 
    grant = 2000;
    break;
  case "phd": 
    grant = 3000;
    break;
}

In object-oriented languages, this command can be refactored to a single line of code:

grant = student.getGrant();

In the refactored version, the type attribute of Student is no longer necessary, so it can be removed. However, we must implement in the Student’s subclasses—for example, UndergraduateStudent, MasterStudent, and PhDStudent—a method getGrant(). Finally, in the superclass Student, this method must be abstract, that is, it should have only a signature, without a body.

Remove Dead Code recommends deleting methods, classes, variables, or attributes that are no longer used. For example, in the case of a method, there might be no calls to it anymore. In the case of a class, it might not be instantiated or inherited by any other classes. In the case of an attribute, it might not be used within the class body, nor in subclasses or in other classes. It might sound this is a uncommon refactoring, but in large systems, maintained over years by literally hundreds of developers, there is usually a considerable amount of code that is no longer used.

9.3 Refactoring Practice

After describing various refactorings in the previous section, let’s now discuss how the practice of refactoring can be adopted in real software projects.

Firstly, successful refactoring depends on the existence of good tests, primarily unit tests. That is, without tests, it becomes risky to make changes in a system, especially when these changes do not add new functionalities or correct bugs, as is the case with refactorings. John Ousterhout comments on the importance of tests during refactoring activities (link, Section 19.3):

Tests, particularly unit tests, play an important role in design because they facilitate refactoring. Without a test suite, its dangerous to make major structural changes to a system. […] As a result, developers avoid refactoring in a system without good test suites; they try to minimize the number of code changes for each new feature or bug fix, which means that complexity accumulates and design mistakes don’t get corrected.

A second important issue concerns when code should be refactored. There are two main ways to perform refactorings: opportunistically or strategically.

Opportunistic refactorings are carried out in the midst of a programming task when we discover that a piece of code is not well implemented and therefore can be improved. This can happen when correcting a bug or implementing a new feature. For instance, in the middle of these tasks, we might notice that a method’s name is unclear, that a method is too long and difficult to understand, that a conditional command is too complex, that some code is no longer used, etc. Thus, whenever we detect problems with a piece of code, we should refactor it as soon as possible. To try to give some concrete guides, suppose we will spend an hour implementing a new feature. Thus, it is desirable to invest part of that time—perhaps 20% or more—in refactorings. Kent Beck has a fascinating quote about opportunistic refactorings:

For each desired change, make the change easy (warning: this may be hard), then make the easy change.

The underlying premise of this recommendation is that we may be struggling to implement a change precisely because the code is not prepared to accommodate it. Therefore, we should first take a step backward and refactor the code to make the change easy. Once done, we can take two steps forward and easily implement the change we were assigned.

Most of the time, refactorings are opportunistic. However, it is also possible to have planned refactorings. Usually, they involve more time-consuming and complex changes that cannot be accommodated into a typical development task. Instead, they should be carried out in scheduled and dedicated sessions. For instance, these refactorings might involve breaking a package into two or more subpackages. As a second example, suppose the teams neglected the practice of refactoring for a long time. As there are now numerous pending refactorings, it’s better to schedule them for a specific time period.

9.4 Automated Refactorings

Today, most IDEs provide support to automate refactorings in the following way: developers select the block of code they intend to refactor and the refactoring operation they wish to perform. Then, the IDE performs this operation automatically. The following figures illustrate an automatic renaming via an IDE. Firstly, in this example, the developer highlights the m1 method:

Next, the developer selects the Refactor and Rename options:

Then the IDE asks for the new name of the method (follow in the next figure). In the same dialog box, the developer requests to update the references to this method.

Afterwards, the IDE carries out the refactoring:

As we can see, the method was renamed to m11. Furthermore, the calls in m3 and m4 were updated to use this new name.

Although it’s called an automated refactoring, the example makes it clear that it’s up to developers to specify the code they want to refactor and the refactoring operation they intend to perform. The developers must also provide information about this refactoring, such as the new name in the case of renaming. From this point on, the refactoring becomes automated.

Before applying the refactoring, the IDE checks whether its preconditions are true, i.e., if the refactoring will not result in a compilation error or change the program’s behavior. In the previous example, if the user asks to rename m1 to m2, the IDE informs that this refactoring can’t be performed, as the class already has a method named m2.

9.4.1 Checking Refactoring Preconditions

The task of checking the preconditions of a refactoring, in most cases, is not simple. To illustrate the complexity of this task, let’s re-use a small Java program proposed by Friedrich Steimann and Andreas Thies (link). As outlined below, this program has two classes, A and B, implemented in distinct files but belonging to the same package P1. Thus, the call to m("abc") in the first file executes the m(String) in class B.

// File A.java
package P1;

public class A {
  void n() { 
    new B().m("abc"); // executes m(String) from B
  }
}
// File B.java
package P1;

public class B {
  public void m(Object o) {...}
  void m(String s) {...}
}

However, suppose class B is moved to a new package; for instance, to P2:

// File B.java
package P2;   // new package for B

public class B {
  public void m(Object o) {...}
  void m(String s) {...}
} 

Although it might seem harmless, this Move Class changes the behavior of the program. In the new version, the call to m("abc") results in the execution of m(Object) and not m(String). The reason is that class B no longer resides in the same package as A. Thus, m(String) is no longer visible to A as it is not a public method. For those unfamiliar with Java, a public method of a public class, like m(Object), can be called from any part of the code. But methods without an explicit visibility modifier, like m(String), can only be called by code within the same package.

In summary, this particular Move Class operation is not a refactoring, as it does not preserve the program’s behavior. If performed with the assistance of an IDE, the IDE should detect this problem and warn the user that the operation can’t be performed because it changes the program’s behavior.

9.5 Code Smells

Code Smells—also known as bad smells—are indicators of low-quality code, that is, code that is difficult to maintain, understand, modify, or test. In short, code that does not smell good and therefore might need refactoring. However, in this definition, the term indicators means that we should not consider that every smell requires an immediate refactor. This decision depends on other factors, such as the importance of the code containing the smell and how often it will need maintenance. After clarifying this point, we will present, in the rest of this section, some examples of code smells.

9.5.1 Duplicated Code

Code duplication is a widely-common code smell and among the ones with the highest potential to harm the evolution of a system. Duplicated code increases maintenance effort, as changes need to be applied in more than one part of the code. Consequently, there is a risk of changing one part and forgetting about another. Duplicated code also makes the codebase more complex, as data and commands that could be encapsulated in methods or classes are scattered throughout the system.

To eliminate code duplication, the following refactorings can be used: Extract Method (recommended when the duplicated code is within two or more methods), Extract Class (recommended when the duplicate code refers to attributes and methods that appear in more than one class), and Pull Up Method (recommended when the duplicate code is a method present in two or more subclasses).

Blocks of duplicated code are called clones. However, different criteria can be used to define when two code blocks A and B are, in fact, duplicated. These criteria give rise to four types of clones, as described below:

  • Type 1: when A and B have the same code, with differences only in comments and spaces.

  • Type 2: when A and B are Type 1 clones, but the variables used in A and B may have different names.

  • Type 3: when A and B are Type 2 clones, but with minor changes in commands.

  • Type 4: when A and B are semantically equivalent, but their implementations use different algorithms.

Example: To illustrate these types of clones, we will use the following function:

int factorial(int n) {
  fat = 1;
  for (i = 1; i <= n; i++)
    fat = fat * i;
  return fat;
}

Next, we show four clones of this function.

  • Type 1: adds a comment and removes spaces between operators.
int factorial(int n) {
  fat=1;
  for (i=1; i<=n; i++) 
    fat=fat*i;
  return fat; // returns factorial
}
  • Type 2: renames some variables.
int factorial(int n) {
  f = 1;
  for (j = 1; j <= n; j++) 
    f = f * j;
  return f; 
}
  • Type 3: adds a command that prints the value of the factorial.
int factorial(int n) {
  fat = 1;
  for (j = 1; j <= n; j++)
    fat = fat * j;
  System.out.println(fat); // new command
  return fat;
}
  • Type 4: implements a recursive version of the function.
int factorial(int n) {
  if (n == 0)
     return 1;
  else return n*factorial(n-1);
}

In these cases, we don’t need to have two or more factorial functions. Only one can stay in the code and the others can be removed.

Real World: In 2013, Auki Yamashita and Leon Moonen, two researchers from a research lab in Norway, published the results of an exploratory study on code smells involving 85 developers (link). When asked about the smells they were usually most concerned with, the most common answer was Duplicated Code, with nearly 20 points on the scale used to rank the responses. In second place, with half the points, was Long Method, which is the smell we will discuss next.

9.5.2 Long Method

Ideally, methods should be small, with self-explanatory names and few lines of code. Long Methods are considered a code smell, as they make the code harder to understand and maintain. When we find a long method, we should consider use an Extract Method to break it into smaller methods. However, there’s no maximum number of lines of code that can arbitrarily be used to identify long methods because this depends on the programming language, the relevance of the method, the system’s domain, among other factors. Nevertheless, nowadays, there’s a tendency to write small methods, with fewer than, for instance, 20 lines of code.

9.5.3 Large Class

Like methods, classes shouldn’t assume multiple responsibilities and provide services that aren’t cohesive. Hence, Large Classes are considered a smell because, just like long methods, they make the code harder to understand and maintain. Normally, it’s also more challenging to reuse these classes in another package or system. Large classes are characterized by a large number of attributes with low cohesion among them. The solution to this smell involves using an Extract Class to derive a smaller class B from a larger class A. Thereafter, class A gets an attribute of type B.

In-Depth: When a class grows so much that it starts to monopolize most of the system’s intelligence, it’s called a God Class or a Blob. Classes with generic names like Manager, System, or Subsystem tend to represent instances of this smell.

9.5.4 Feature Envy

This smell corresponds to a method that seems to envy the data and methods of another class. In other words, it accesses more attributes and methods of class B than of its current class A. Therefore, we may consider using a Move Method to migrate it to the envied class.

The fireAreaInvalidated2 method, shown in the following code, is an example of Feature Envy. It has three method calls, but all having as the target the same abt object of type AbstractTool. Besides, it accesses no attribute or calls any method of its current class. Therefore, moving this method to AbstractTool should be considered.

public class DrawingEditorProxy 
             extends AbstractBean implements DrawingEditor {
  ...
  void fireAreaInvalidated2 (AbstractTool abt , Double r ){
    Point p1 = abt.getView().drawingToView (...);
    Point p2 = abt.getView().drawingToView (...);
    Rectangle r=new Rectangle(p1.x,p1.y,p2.x-p1.x p2.y-p1.y);
    abt.fireAreaInvalidated (r);
  }
  ...
}

9.5.5 Long Parameters List

In addition to being small, methods should, whenever possible, have few parameters. That is, a Long Parameters List is a smell that can be eliminated in two main ways. First, we should check whether one of the parameters can be obtained directly by the called method, as shown below:

p2 = p1.f();
g(p1, p2);

In this case, p2 is unnecessary, as it can be obtained at the beginning of g:

void g(p1) {
  p2 = p1.f(); ...
}

Another possibility is to create a type grouping some of the parameters. For example, consider the following method:

void f(Date start, Date end) {
  ...
}

Thus, we can create a DateRange class to represent a range of dates:

class DateRange {
   Date start;
   Date end;
}

void f(DateRange range} {
  ...
}

9.5.6 Global Variables

As we studied in Chapter 5, global variables should be avoided, as they result in a poor type of coupling. For this reason, they are also considered a code smell. Essentially, Global variables make it more difficult to understand a function without looking at other parts of the system. To understand better, consider the following function:

void f(...) {
  // computes a certain value x
  return x + g; // where g is a global variable
}

Just from analyzing this code, can the reader tell what f returns? The answer is no, as we need to know g’s value. However, since g is a global variable, its value can be changed anywhere in the program, which can easily introduce bugs in this function because a single, distant, and unrelated line of code can influence f’s result. All it needs to do is to change the value of g.

It’s also worth noting that in languages like Java, static attributes work exactly like global variables. Therefore, they also represent a code smell.

9.5.7 Primitive Obsession

This code smell occurs when primitive types (that is, int, float, String, etc.) are used instead of specific classes. For example, suppose we need to declare a variable to store a ZIP code for an address. In a rush for rapidly using the variable, we might declare it as a String, instead of creating a dedicated class—for instance, ZIPCode—for this purpose. The main advantage is that a class can provide methods to manipulate the stored values. For example, the class constructor can check if the informed ZIP code is valid before initializing the object. This way, the class assumes this responsibility and, consequently, frees the clients from this concern.

In short, we should not be obsessed with primitive types. Instead, we should consider the possibility of creating classes that encapsulate primitive values and provide operations to handle them. In the next smell, we’ll complement this recommendation and suggest that such objects, whenever possible, should also be immutable.

9.5.8 Mutable Objects

In the second edition of his book on refactoring, Fowler considers Mutable Objects a code smell. A mutable object is one whose state can be modified. On the other hand, an immutable object, once created, can no longer be changed. To facilitate the creation of immutable objects, classes should declare all their attributes as private and final (a final attribute is read-only). The class should also be declared final to prohibit the creation of subclasses. Consequently, if we need to modify an immutable object, the alternative is to create a new instance with the desired state.

For example, String objects in Java are immutable, as illustrated by the following program.

class Main {
  public static void main(String[] args) {
    String s1 = "Hello World";
    String s2 = s1.toUpperCase();
    System.out.println(s1);
    System.out.println(s2);
  }
}

This program prints Hello World and then HELLO WORLD. The reason is that the toUpperCase method does not change the s1 string, but returns a copy of it in uppercase.

Whenever possible, we should create immutable objects, as they can be shared freely and safely with other functions. For example, in Java, you can pass a string as a parameter to a function f and be sure that f will not change its content. This would not happen if strings were mutable because it would be possible for f to change the string. As a second advantage, immutable objects are thread-safe, meaning there is no need to synchronize the access of threads to their methods. The reason is simple: the classical problems of concurrent systems, like race conditions, occur only when multiple threads can change the state of an object. If the state is immutable, such problems automatically disappear.

However, we should understand this code smell within the context of the programming language used in a project. For example, when we use a functional language, objects are immutable by definition, meaning, this code smell will never appear. On the other hand, when we use an imperative language, it is normal to have a certain number of mutable objects. For this reason, in these languages, we should try to minimize the number of such objects, without, however, assuming we will completely eliminate them. For instance, we should make simple and small objects immutable, like those representing ZIPCode, Currency, Address, Date, Time, Phone, Color, Email, etc. To illustrate, we show next the implementation of an immutable Date class:

final public class Date { // final: cannot have subclasses
  final private int day; // final: read-only attributes
  final private int month;
  final private int year;
   
  private void check(int day, int month, int year) 
               throws InvalidDateException {
     // verifies valid date
     // if not valid, throws InvalidDateException
  } 
   
  public Date(int day, int month, int year) 
         throws InvalidDateException {
    check(day, month, year);
    this.day = day;
    this.month = month;
    this.year = year;
  }
   
  // other methods
}

9.5.9 Data Classes

These classes only have attributes, and no methods. At most, they have getters and setters. However, as is recurrent with code smells, we should not consider any Data Class as a major concern. Instead, the important thing is to analyze the code and try to move behavior to these classes. That is, whenever it makes sense, we should add methods to these classes with operations that are already being done, but throughout other classes.

9.5.10 Comments

It is strange to see comments included in a list of code smells. For instance, in introductory programming courses, students are usually encouraged to comment on all the code they produce, with the aim of learning the importance of code documentation. In the book Elements of Programming Style, Brian Kerninghan—one of the creators of the Unix operating system and the C programming language—and P. J. Plauger offer an advice that clarifies the view of comments as smells. They recommend the following:

Don’t comment bad code, rewrite it.

Thus, the idea is that comments should not be used to explain bad code. Instead, we should refactor the code and, thereby, improve its readability. Having done that, there’s a good chance that the comments will no longer be necessary. An example are Long Methods like this one.

void f() {
  // task1
  ... 
  // task2
  ... 
  // taskn
  ...
}

If we use Extract Method to extract the commented code, we’ll get the following improved version:

void task1 { ... }
void task2 { ... }
void taskn { ... }

void f { 
  task1();
  task2();
  ...
  taskn();
}

After this refactoring, the comments are no longer necessary, as the method names already reveal their purpose.

In-Depth: Technical debt is a term coined by Ward Cunningham in 1992 to designate technical problems that can hinder the maintenance and evolution of a system. Among others, these problems include lack of tests, architectural issues (for example, systems that are a Big Ball of Mud), systems with many code smells, or without any documentation at all. Cunningham’s intention was to create a term understood by managers and people without knowledge of software engineering principles and practices. Hence, he chose the term debt to reinforce that these issues, if not resolved, will require the payment of interest. This interest usually manifests in the form of inflexible and difficult-to-maintain systems, where bug fixes and the implementation of new features increasingly take more time and are more risky.

To illustrate the concept, suppose that adding a new feature F1 in a program would require three days. However, if there were no technical debt, F1 could be implemented in just two days. This one-day difference constitutes the interest charged by the technical debt present in our program. One alternative is to pay off the principal of this debt, that is, completely remove the technical debt. But that might take, for instance, four days. In other words, if we plan to evolve the program with F1 only, there’s no advantage. However, suppose that over the next months we will have to implement more features, such as F2, F3, F4, etc. In this case, eliminating the principal of the technical debt might pay off.

Bibliography

Martin Fowler. Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1st edition, 2000.

Martin Fowler. Refactoring: Improving the Design of Existing Code, Addison-Wesley, 2nd edition, 2018.

Danilo Silva, Nikolaos Tsantalis, Marco Tulio Valente. Why We Refactor? Confessions of GitHub Contributors. Foundations of Software Engineering, 2016.

Exercises

1. Mark the FALSE alternative:

(a) Refactorings improve the design of a software system.

(b) Refactorings make the code of a system easier to understand.

(c) Refactorings facilitate the location and correction of future bugs.

(d) Refactorings speed up the implementation of new features.

(e) Refactorings improve a system’s runtime performance.

2. The following graph displays the number of features implemented in two similar systems (A and B), developed by comparable teams using the same technologies. In which system do you believe refactorings were systematically performed? Justify your answer.

3. Describe the differences between opportunistic and planned refactorings. Which of these types of refactoring should be more common?

4. Give the names of refactorings A and B that, if executed in sequence, do not change the system’s code. In other words, refactoring B undoes the transformations made by refactoring A.

5. In these examples, extract the lines commented with the word extract to a method g.

class A {
  void f() {
    int x = 10
    x++;      
    print x;   // extract
  }
}
class A {
  void f() {
    int x = 10
    x++;     // extract
    print x; // extract
  }
}
class A {
  void f() {
    int x = 10
    x++;     // extract
    print x; // extract
    int y = x+1;
    ...
  }
}
class A {
  void f() {
    int x = 10
    int y;     // extract
    y = h()*2; // extract
    print y;   // extract
    int z = y+1;
    ...
  }
}

6. The following function calculates the n-th term of the Fibonacci sequence. The first term is 0, the second term is 1, and thereafter, the n-th term is the sum of the preceding two terms.

int fib(int n) {
  if (n == 1)
     return 0;
  if (n == 2)
     return 1;
  return fib(n-1) + fib(n-2);
}

Describe Types 1, 2, 3, and 4 clones for this function. It is not necessary to implement the clones, but be precise in your answer when indicating the differences between each clone and the original code.

7. Consider the following class that stores a specific value in dollars.

class Currency {
  ...
  private double value = 0.0;
  void add(Currency m) {
    this.value = this.value + m.getValue();
  } 
  ...
}

(a) Why are the objects of this class mutable?

(b) Reimplement the class to ensure that its objects are now immutable.

8. As discussed in Section 9.5, comments that are used to explain bad code are a code smell. In these cases, the ideal is to clean the code, and then, remove the comments. Next, we show another category of comments that may be deleted. Explain why these comments are unnecessary.

// Student class
class Student {
   
   // Student's enrollment number
   int enrollment;
   
   // Student's date of birth
   Date dateOfBirth;
   
   // Student's address
   Address address;
   
   // Default constructor of the Student class
   Student() {
     ...
   } 
   ...  
}

9. Use an IDE to perform a simple refactoring on one of your programs. For example, perform a method renaming. What are the advantages of performing refactorings with the support of IDEs?

10. Use a Java IDE to perform the Move Class refactoring discussed in Section 9.4.1. Did the IDE report any error? If so, describe this error.