I’m not a great programmer; I’m just a good programmer with great habits. – Kent Beck
This chapter begins with an introduction to refactorings, which are
modifications made to a program to make it easier to understand and
evolve. In Section 9.2, we describe a series of refactoring operations.
In many cases, we use code examples, some of which are derived from real
refactorings performed on open-source projects. Next, in Section 9.3, we
discuss some aspects of the practice of refactoring, including the
importance of having a robust unit test suite. Section 9.4 explores the
support provided by IDEs for automated refactoring. Finally, Section 9.5
describes a list of code smells, which are indicators that a particular
code structure isn’t smelling good
and therefore, could be the
subject of a refactoring.
In the previous chapter, we studied that software needs to be tested, like any engineering product. The same recommendation applies to maintenance activities. That is, software also needs maintenance. In fact, in Chapter 1, we mentioned the various types of maintenance that can be performed on software systems. For example, when a bug is detected, we have to apply corrective maintenance. When users or the Product Owner request a new feature, we have to conduct evolutionary maintenance. When a business rule or some technology used by the system changes, we must set aside time for performing an adaptive maintenance.
In addition, software systems also age, as happens with living beings. In the early 1970s, Meir Lehman, then working at IBM, elaborated on this phenomenon, and as a result, enunciated a set of empirical laws about aging, internal quality, and the evolution of software systems, known as the Software Evolution Laws or simply Lehman’s Laws. The first two laws stated by Lehman were:
A system must be continuously maintained to adapt to its environment. This process must continue until the point where it becomes more advantageous to replace it with a new system.
As a system undergoes maintenance, its internal complexity increases unless work is done to maintain or prevent this phenomenon.
The first law justifies the need for adaptive and evolutionary
maintenance. It also mentions that systems can die
, meaning there
may come a time when it is more advantageous to replace a system with a
new one. The second Lehman’s Law states that maintenance makes the code
and the internal structure of a system more complex and harder to
maintain in the future. In other words, there is a natural deterioration
of a system’s internal quality as it undergoes maintenance and
evolution. However, this second law also makes a reservation: unless
work is done to maintain or prevent this phenomenon. Today, this work is
called refactoring.
Refactorings are code transformations that improve a system’s maintainability without affecting its execution. To explain this definition, let’s divide it into three parts. First, when the definition mentions code transformations, it is referring to modifications in the code, like splitting a function into two, renaming a variable, moving a function to another class, extracting an interface from a class, etc. Next, the definition mentions the goal of such transformations: improve the maintainability of the system, that is, improve its modularity, design, or architecture, enhance its testability, make the code more readable, easier to understand and modify, etc. Finally, a restriction is set: it is not worth improving maintainability but changing the program results. In other words, refactoring must deliver the system working exactly as it was before the transformation. Another way to express this is to assert that refactorings must preserve the system’s behavior or semantics.
However, in the 70s and 80s when Lehman’s Laws were formulated, the term refactoring was not yet used. One of the first uses of this term occurred in 1992, in a doctoral thesis defended by William Opdyke at the University of Illinois (link). Next, in 1999, refactoring, already with this name, was included among the programming practices advocated by Extreme Programming. In the first edition of the XP book, it was recommended that developers should perform refactorings with the goal of restructuring their systems to remove duplication, improve communication with other developers, simplify or make the design more flexible.
In 2000, Martin Fowler released the first edition of a book dedicated to refactoring, which was very successful and contributed to popularizing this practice. One of the reasons for this success was that the book presents a catalog with dozens of refactorings. In a way that resembles a catalog of design patterns (as we studied in Chapter 6), the presentation of the refactorings starts by naming them, which helped create a vocabulary about refactoring. In the book, Fowler also presents the mechanics of each refactoring, includes code examples, and discusses the benefits and disadvantages of the refactorings.
In his book, Fowler also mentions Kent Beck’s phrase that opens this
chapter and emphasizes the importance of following good programming
habits
, which are essential to maintain the program’s health,
ensuring that it continues to provide value over the years. Therefore,
developers should not only carry out corrective, adaptive, and
evolutionary maintenance. It is also important to take care of the
system’s maintainability by performing frequent refactorings.
Notice: The term refactoring has become common in Software Engineering. Therefore, sometimes it is used to indicate improvements in other non-functional requirements, besides maintainability. For example, we often hear developers mention they will refactor their code to improve performance, introduce concurrency, improve the usability of the interface, etc. However, in this book, we use the term restricted to its original definition, that is, only to refer to modifications that improve the maintainability of the source code.
In this section, we will present some widely used refactorings. For each one, we will comment on the name, motivation, and implementation mechanics. Furthermore, we illustrate the presentation with examples of real refactorings performed by open-source developers.
Extract Method is one of the most important refactorings as it aims
to extract a piece of code from method f
and move it to a
new method g
. Subsequently, f
will include a
call to g
. The following code illustrates this refactoring.
First, the code before the refactoring:
void f() {
... // A
... // B
... // C
}
And now the code after the extraction of method g
:
void g() { // extracted method
... // B
}
void f () {
... // A
g();
... // C
}
There are also variations in the method extraction mechanics. For
instance, it’s possible to extract multiple methods, denoted as
g1
, g2
, …, gn
, from a single
method f
all at once. Additionally, we can extract the same
code g
from multiple methods, denoted as f1
,
f2
, …, fn
. In this case, the extraction is
used to eliminate code duplication since the code in
g
is repeated in several methods.
To perform a Extract Method, we may have to pass parameters to the new method. This is necessary, for example, if the extracted method needs to access local variables from the original one. Additionally, the extracted method should return values if they are used later by the original method. Finally, if local variables are only used in the extracted code, we should also extract their declarations.
Extract Method is often referred to as the Swiss Army knife of refactorings due to its versatility and wide range of applications. For instance, it can be employed to break down a large method into smaller ones, each serving a specific goal that is reflected in its name. This approach also enhances the comprehensibility of the original method, as its body after the refactoring consists only of a sequence of calls. An example is provided below, and other applications of Extract Method are discussed in a following section.
Example: Here, we provide a real example of Extract
Method from an Android system. This system has an onCreate
method, which uses SQL commands to create the database tables it
manipulates. The code for the initial version of
onCreate
—after a few edits, simplifications, and removal of
comments—is presented below. The original source code exceeds 200
lines.
void onCreate(SQLiteDatabase database) {// before extraction
// creates table 1
database.execSQL("CREATE TABLE " +
CELL_SIGNAL_TABLE + " (" + COLUMN_ID +
" INTEGER PRIMARY KEY AUTOINCREMENT, " + ...
database.execSQL("CREATE INDEX cellID_index ON " + ...);
database.execSQL("CREATE INDEX cellID_timestamp ON " +...);
// creates table 2
String SMS_DATABASE_CREATE = "CREATE TABLE " +
SILENT_SMS_TABLE + " (" + COLUMN_ID +
" INTEGER PRIMARY KEY AUTOINCREMENT, " + ...
database.execSQL(SMS_DATABASE_CREATE);
String ZeroSMS = "INSERT INTO " + SILENT_SMS_TABLE +
" (Address,Display,Class,ServiceCtr,Message) " +
"VALUES ('"+ ...
database.execSQL(ZeroSMS);
// creates table 3
String LOC_DATABASE_CREATE = "CREATE TABLE " +
LOCATION_TABLE + " (" + COLUMN_ID +
" INTEGER PRIMARY KEY AUTOINCREMENT, " + ...
database.execSQL(LOC_DATABASE_CREATE);
// more 200 lines, creating other tables
}
To enhance the clarity of this method, a developer decided to perform
seven Extract Method operations. Each extracted method became
responsible for creating one of the tables. By examining the names of
these methods, we can easily infer the tables they create. After these
extractions, the onCreate
method only calls the extracted
methods, as shown below. The method’s size has been reduced from over
200 lines to just seven lines. It’s also worth noting that the extracted
methods also have a parameter representing the database where the tables
should be created.
public void onCreate(SQLiteDatabase database) {
createCellSignalTable(database);
createSilentSmsTable(database);
createLocationTable(database);
createCellTable(database);
createOpenCellIDTable(database);
createDefaultMCCTable(database);
createEventLogTable(database);
}
In 2016, along with Danilo Silva and Prof. Nikolaos Tsantalis from Concordia University in Montreal, we conducted a study with GitHub developers to discover their motivations for performing refactorings (link). In the study, we identified 11 distinct motivations for method extractions, as shown in the following table.
Motivations for Extract Method | Occurrences |
---|---|
Extracting a method that will be called by other methods | 43 |
Introducing an alternative signature for an existing method | 25 |
Breaking a method into smaller parts to improve understanding | 21 |
Extracting a method to remove code duplication | 15 |
Extracting a method to facilitate the implementation of a new feature or to fix a bug | 14 |
Extracting a method with a better name or fewer parameters | 6 |
Extracting a method to test its code | 6 |
Extracting a method that will be overridden by subclasses | 4 |
Extracting a method to allow recursive calls | 2 |
Extracting a method that will work as a factory | 1 |
Extracting a method that will run on a thread | 1 |
Analyzing this table, we can see that the primary motivation is to
extract code for reuse. Developers often come across a piece of code
they need, only to find it is already implemented within a method
f
. To make this code accessible, they perform a Extract
Method, creating a new method g
. This allows them to call
g
in the code they are working on. Here’s an example of how
one developer reported this motivation for extracting a method:
These refactorings were made because of code reusability. I needed to use the same code in a new method. I always try to reuse code, because when there’s a lot of code redundancy it gets overwhelmingly more complicated to work with the code in future, because when something change in code that is duplicate somewhere, it usually needs to be changed also there.
As seen in the previous table, the second most frequent motivation is introducing an alternative signature for a method. An example is shown below, involving a method that logs a string to a file. We first show the code before the refactoring:
void log(String msg) {
// saves msg to file
}
The developer decided to provide an alternative version of this
method with a boolean parameter, indicating whether the string should
also be printed on the console. Here is the code after extracting this
second log
method:
void log(String msg, boolean console) {
if (console)
System.out.println(msg);
// saves msg to a file
}
void log(String msg) {
log(msg, false);
}
The reader should also observe that, technically, this transformation
is an Extract Method. Initially, the log(String)
code was
extracted to log(String, boolean)
, creating a new method.
Subsequently, minor adjustments were made to the extracted code,
including the addition of an if
statement. Such adjustments
are common and do not disqualify the transformation as a method
extraction.
This refactoring operates in the opposite direction of method extraction. Consider a small method with only one or two lines of code, which are rarely called. Therefore, the benefits offered by this method, both in terms of reuse and organization of the code, are minimal. Consequently, it can be eliminated from the system, and its body can be incorporated into the call sites. It’s important to note that Inline Method is a less common and less significant operation compared to Extract Method.
Example: The following code shows an example of
Inline Method. First, here is the code before the inline. We can see
that the writeContentToFile
method has a single line of
code and is called only once by the write
method.
private void writeContentToFile(final byte[] revision) {
getVirtualFile().setBinaryContent(revision);
}
private void write(byte[] revision) {
VirtualFile virtualFile = getVirtualFile();
...
if (document == null) {
writeContentToFile(revision);
}
...
}
The developers of this code decided to delete the
writeContentToFile
method and expand its body at the single
call site. The code after the refactoring is shown below.
private void write(byte[] revision) {
VirtualFile virtualFile = getVirtualFile();
...
if (document == null) {
virtualFile.setBinaryContent(revision); // after inline
}
...
}
It is not uncommon to find a method implemented in the wrong class.
That is, even though a method f
is located in a class A, it
might use more services from class B. For instance, it may have more
dependencies on elements of B than on elements of its original class. In
these cases, moving f
to B should be considered. This
refactoring can improve the cohesion of A, reduce the coupling between A
and B, and ultimately make both classes easier to understand and
modify.
Due to its characteristics, Move Method is one of the refactorings with the highest potential to improve modularity. As its effects are not restricted to a single class, Move Method can have a positive impact on the system’s architecture, ensuring that methods are placed in the appropriate classes, both from a functional and architectural viewpoint.
Example: Suppose the code of an IDE. In this code,
we have a class called PlatformTestUtil
, which is
responsible for a given feature provided by the IDE. In fact, details
about this feature are not important for our point here and thus we will
omit them. For us, the key point is that this class has a method called
averageAmongMedians
(see below), which calculates the
average of the medians stored in an array. However, this method has no
relation with the other methods in PlatformTestUtil
. For
example, it does not call methods or use attributes of
PlatformTestUtil
. It is, in summary, independent of the
rest of the class.
class PlatformTestUtil {
...
public static long averageAmongMedians(long[] time,
int part) {
int n = time.length;
Arrays.sort(time);
long total = 0;
for (int i= n/2-n/part/2; i< n/2+n/part/2; i++) {
total += time[i];
}
int middlePartLength = n/part;
return middlePartLength == 0 ? 0:total/middlePartLength;
}
...
}
Thus, the developers of this IDE decided to move
averageAmongMedians
to a class called
ArrayUtil
, whose purpose is exactly to provide general
functions for array manipulation. Thus, it is a class more related to
the service provided by averageAmongMedians
.
After the Move Method, the existing calls to
averageAmongMedians
should be updated, as shown in the
figure.
However, as the reader can observe in this figure, this update was
not difficult because averageAmongMedians
is a static
method. Therefore, only the class name was changed at each call
site.
In other cases, however, it’s not simple to update the calls after a Move Method. This happens when there are no references to objects of the target class in the call sites. In such cases, a possible solution is to leave a simple implementation of the method in the original class, which just forwards the calls to the new method’s class. This way, the existing calls do not need to be updated. An example is shown below. First, the original code:
class A {
B b = new B();
void f { ... }
}
class B { ... }
class Client {
A a = new A();
void g() {
a.f();
...
}
}
And now the code after the refactoring:
class A {
B b = new B();
void f {
b.f(); // just forwards the call to B
}
}
class B { // f was moved from A to B
void f { ... }
}
class Client {
A a = new A();
void g() {
a.f(); // no changes
...
}
}
Note that f
was moved from class A to class B. However,
in A, we left a version of the method that simply forwards the call to
B. That’s why the Client
class didn’t need updates.
When they occur in the same hierarchy of classes, Move Method
receives special names. For example, when we move a method from a
subclass to a superclass, it is called Pull Up Method.
To illustrate, suppose the same f
method is implemented in
subclasses B1
and B2
. Thus, this duplicated
implementation can be avoided by pulling up
each implementation
to the superclass A
, as shown next.
Conversely, when a method is moved down the classes hierarchy, that
is, from a superclass to a subclass, we say a Push Down
Method was performed. For instance, despite being implemented
in superclass A
, a method f
might be of
interest to only one subclass, let’s call it B1
. Therefore,
we can push down
its implementation to B1
, as shown
next.
To conclude, refactoring operations can be done in sequence. For
example, consider the following class A
with a method
f
:
class A {
B b = new B();
void f(){
S1;
S2;
}
}
class B { ... }
First, from f
, we’ll extract a method g
with the statement S2
:
class A {
B b = new B();
void g() { // extracted method
S2;
}
void f(){
S1;
g();
}
}
class B { ... }
Next, we’ll move g
to class B
, as
illustrated next:
class A {
B b = new B();
void f(){
S1;
b.g();
}
}
class B {
void g() { // moved method
S2;
}
}
This refactoring is recommended when a program has a class A with many responsibilities and attributes. Some of these attributes are related and can be defined together, but in another location. In other words, they can be extracted into a new class B. After the extraction, we should declare in A an attribute of type B.
Example: Let’s consider the class
Person
shown below. In addition to other attributes, which
have been omitted from the example, it stores the person’s phones,
including area code and number.
class Person {
String areaCode;
String phone;
String alternativeAreaCode;
String alternativePhone;
...
}
Thus, we can extract from Person
a new class, called
Phone
, as shown below. After the refactoring,
Person
gains two attributes of this new type.
class Phone { // extracted class
String areaCode;
String number;
}
class Person {
Phone phone;
Phone alternativePhone;
...
}
A similar refactoring is called Extract Interface.
For example, suppose a library that implements classes
LinkedList
and ArrayList
. From these classes,
we can extract an interface List
, with the methods common
to them. Thus, the library’s clients can start practicing the Prefer
Interfaces to Concrete Classes design principle, which we studied in
Chapter 5.
There’s a provocative phrase, attributed to Phil Karlton, that says
there are only two hard things in Computer Science: cache
invalidation and naming things.
Thus, as naming is hard, we often
have to rename a code element, be it a variable, function, method,
parameter, attribute, class, etc. This may happen because the initial
name of the element was not a good choice. Another reason is that the
responsibility of this element may have changed over time, and thus its
name became outdated. In both cases, it is recommended to perform one of
the most popular refactorings: renaming. That is,
giving a more appropriate and meaningful name to a code element.
When this refactoring is applied, the most complex part is not
changing the name of the element, but updating the points in the code
where it is referenced. For example, if a method f
is
renamed to g
, all calls of f
must be updated.
In fact, if f
is widely used, it may be interesting to
extract it to a new method, with the new name, and keep the old method
deprecated.
To illustrate, suppose the following method f
:
void f () {
// A
}
Next, we show the code after renaming f
to
g
. Note that we have kept a deprecated version of
f
in the code, which just calls g
.
void g() { // new name
// A
}
@deprecated
void f() { // old name
g(); // forwards call to new name
}
Deprecation is a mechanism offered by programming languages to
indicate that a code element is outdated and, therefore, should not be
used anymore. When the compiler finds out that a piece of code is using
a deprecated element, it issues a warning. As we commented in the
previous example, the method f
was not simply renamed to
g
. Instead, first, a method g
was created with
f
’s original code. Then, f
was deprecated, and
its code modified to only call g
.
This deprecation-based strategy makes renaming safer, as it does not
require updating all calls of f
at once. Thus, it gives
time for clients to adapt to the change and start using the new name. In
fact, refactorings—even the simplest ones, like a renaming—should be
performed in small or baby steps to make sure they won’t introduce bugs
in a code that was working before.
The refactorings presented earlier are the ones with the higher potential for improving a system’s design, as they involve operations with a global scope, such as Move Method or Extract Class. However, there are refactorings with a narrow scope that improve, for example, the internal implementation of a single method. Next, we briefly describe some of these refactorings.
Extract Variable is used to simplify expressions and make them easier to read and understand. Here’s an example:
x1 = (-b + sqrt(b*b-4*a*c)) / (2*a);
This code can be refactored to:
delta = b*b-4*a*c; // extracted variable
x1 = (-b + sqrt(delta)) / (2*a);
Note that a new variable delta
was created and
initialized with part of a larger expression. As a result, this
expression, after the refactoring, became shorter and easier to
understand.
Remove Flag is a refactoring that suggests using
commands like break
or return
, instead of
control variables (or flags). Here’s a code example:
boolean search(int x, int[]a) {
boolean found = false;
i = 0;
while (i < a.length) && (!found) {
if (a[i] == x);
found = true;
i++;
}
return found;
}
This code can be refactored as follows:
boolean search(int x, int[]a) {
for (i = 0; i < a.length; i++)
if (a[i] == x)
return true;
return false;
}
After this refactoring, the function became smaller and with a
clearer logic, thanks to using a return
command to exit
immediately once the searched value has been found. The flag variable
found
was then removed.
There are also refactorings to simplify conditional commands. One of
them is called Replace Conditional with Polymorphism.
To understand it, consider a switch
command that returns
the value of a student’s research grant, depending on the student’s
type:
switch (student.type) {
case "undergraduate":
grant = 750;
break;
case "master":
grant = 2000;
break;
case "phd":
grant = 3000;
break;
}
In object-oriented languages, this command can be refactored to a single line of code:
grant = student.getGrant();
In the refactored version, the type
attribute of
Student
is no longer necessary, so it can be removed.
However, we must implement in the Student
’s subclasses—for
example, UndergraduateStudent
, MasterStudent
,
and PhDStudent
—a method getGrant()
. Finally,
in the superclass Student
, this method must be abstract,
that is, it should have only a signature, without a body.
Remove Dead Code recommends deleting methods, classes, variables, or attributes that are no longer used. For example, in the case of a method, there might be no calls to it anymore. In the case of a class, it might not be instantiated or inherited by any other classes. In the case of an attribute, it might not be used within the class body, nor in subclasses or in other classes. It might sound this is a uncommon refactoring, but in large systems, maintained over years by literally hundreds of developers, there is usually a considerable amount of code that is no longer used.
After describing various refactorings in the previous section, let’s now discuss how the practice of refactoring can be adopted in real software projects.
Firstly, successful refactoring depends on the existence of good tests, primarily unit tests. That is, without tests, it becomes risky to make changes in a system, especially when these changes do not add new functionalities or correct bugs, as is the case with refactorings. John Ousterhout comments on the importance of tests during refactoring activities (link, Section 19.3):
Tests, particularly unit tests, play an important role in design because they facilitate refactoring. Without a test suite, its dangerous to make major structural changes to a system. […] As a result, developers avoid refactoring in a system without good test suites; they try to minimize the number of code changes for each new feature or bug fix, which means that complexity accumulates and design mistakes don’t get corrected.
A second important issue concerns when code should be refactored. There are two main ways to perform refactorings: opportunistically or strategically.
Opportunistic refactorings are carried out in the midst of a programming task when we discover that a piece of code is not well implemented and therefore can be improved. This can happen when correcting a bug or implementing a new feature. For instance, in the middle of these tasks, we might notice that a method’s name is unclear, that a method is too long and difficult to understand, that a conditional command is too complex, that some code is no longer used, etc. Thus, whenever we detect problems with a piece of code, we should refactor it as soon as possible. To try to give some concrete guides, suppose we will spend an hour implementing a new feature. Thus, it is desirable to invest part of that time—perhaps 20% or more—in refactorings. Kent Beck has a fascinating quote about opportunistic refactorings:
For each desired change, make the change easy (warning: this may be hard), then make the easy change.
The underlying premise of this recommendation is that we may be
struggling to implement a change precisely because the code is not
prepared to accommodate it. Therefore, we should first take a step
backward
and refactor the code to make the change easy
. Once
done, we can take two steps forward and easily
implement the
change we were assigned.
Most of the time, refactorings are opportunistic. However, it is also possible to have planned refactorings. Usually, they involve more time-consuming and complex changes that cannot be accommodated into a typical development task. Instead, they should be carried out in scheduled and dedicated sessions. For instance, these refactorings might involve breaking a package into two or more subpackages. As a second example, suppose the teams neglected the practice of refactoring for a long time. As there are now numerous pending refactorings, it’s better to schedule them for a specific time period.
Today, most IDEs provide support to automate refactorings in the
following way: developers select the block of code they intend to
refactor and the refactoring operation they wish to perform. Then, the
IDE performs this operation automatically. The following figures
illustrate an automatic renaming via an IDE. Firstly, in this example,
the developer highlights the m1
method:
Next, the developer selects the Refactor and Rename options:
Then the IDE asks for the new name of the method (follow in the next figure). In the same dialog box, the developer requests to update the references to this method.
Afterwards, the IDE carries out the refactoring:
As we can see, the method was renamed to m11
.
Furthermore, the calls in m3
and m4
were
updated to use this new name.
Although it’s called an automated refactoring, the example makes it clear that it’s up to developers to specify the code they want to refactor and the refactoring operation they intend to perform. The developers must also provide information about this refactoring, such as the new name in the case of renaming. From this point on, the refactoring becomes automated.
Before applying the refactoring, the IDE checks whether its
preconditions are true, i.e., if the refactoring will
not result in a compilation error or change the program’s behavior. In
the previous example, if the user asks to rename m1
to
m2
, the IDE informs that this refactoring can’t be
performed, as the class already has a method named m2
.
The task of checking the preconditions of a refactoring, in most
cases, is not simple. To illustrate the complexity of this task, let’s
re-use a small Java program proposed by Friedrich Steimann and Andreas
Thies (link).
As outlined below, this program has two classes, A
and
B
, implemented in distinct files but belonging to the same
package P1
. Thus, the call to m("abc")
in the
first file executes the m(String)
in class
B
.
// File A.java
package P1;
public class A {
void n() {
new B().m("abc"); // executes m(String) from B
}
}
// File B.java
package P1;
public class B {
public void m(Object o) {...}
void m(String s) {...}
}
However, suppose class B
is moved to a new package; for
instance, to P2
:
// File B.java
package P2; // new package for B
public class B {
public void m(Object o) {...}
void m(String s) {...}
}
Although it might seem harmless, this Move Class changes the behavior
of the program. In the new version, the call to m("abc")
results in the execution of m(Object)
and not
m(String)
. The reason is that class B
no
longer resides in the same package as A
. Thus,
m(String)
is no longer visible to A
as it is
not a public method. For those unfamiliar with Java, a public method of
a public class, like m(Object)
, can be called from any part
of the code. But methods without an explicit visibility modifier, like
m(String)
, can only be called by code within the same
package.
In summary, this particular Move Class operation is not a refactoring, as it does not preserve the program’s behavior. If performed with the assistance of an IDE, the IDE should detect this problem and warn the user that the operation can’t be performed because it changes the program’s behavior.
Code Smells—also known as bad smells—are indicators
of low-quality code, that is, code that is difficult to maintain,
understand, modify, or test. In short, code that does not smell
good
and therefore might need refactoring. However, in this
definition, the term indicators
means that we should not consider
that every smell requires an immediate refactor. This decision depends
on other factors, such as the importance of the code containing the
smell and how often it will need maintenance. After clarifying this
point, we will present, in the rest of this section, some examples of
code smells.
Code duplication is a widely-common code smell and among the ones with the highest potential to harm the evolution of a system. Duplicated code increases maintenance effort, as changes need to be applied in more than one part of the code. Consequently, there is a risk of changing one part and forgetting about another. Duplicated code also makes the codebase more complex, as data and commands that could be encapsulated in methods or classes are scattered throughout the system.
To eliminate code duplication, the following refactorings can be used: Extract Method (recommended when the duplicated code is within two or more methods), Extract Class (recommended when the duplicate code refers to attributes and methods that appear in more than one class), and Pull Up Method (recommended when the duplicate code is a method present in two or more subclasses).
Blocks of duplicated code are called clones. However, different criteria can be used to define when two code blocks A and B are, in fact, duplicated. These criteria give rise to four types of clones, as described below:
Type 1: when A and B have the same code, with differences only in comments and spaces.
Type 2: when A and B are Type 1 clones, but the variables used in A and B may have different names.
Type 3: when A and B are Type 2 clones, but with minor changes in commands.
Type 4: when A and B are semantically equivalent, but their implementations use different algorithms.
Example: To illustrate these types of clones, we will use the following function:
int factorial(int n) {
fat = 1;
for (i = 1; i <= n; i++)
fat = fat * i;
return fat;
}
Next, we show four clones of this function.
int factorial(int n) {
fat=1;
for (i=1; i<=n; i++)
fat=fat*i;
return fat; // returns factorial
}
int factorial(int n) {
f = 1;
for (j = 1; j <= n; j++)
f = f * j;
return f;
}
int factorial(int n) {
fat = 1;
for (j = 1; j <= n; j++)
fat = fat * j;
System.out.println(fat); // new command
return fat;
}
int factorial(int n) {
if (n == 0)
return 1;
else return n*factorial(n-1);
}
In these cases, we don’t need to have two or more
factorial
functions. Only one can stay in the code and the
others can be removed.
Real World: In 2013, Auki Yamashita and Leon Moonen, two researchers from a research lab in Norway, published the results of an exploratory study on code smells involving 85 developers (link). When asked about the smells they were usually most concerned with, the most common answer was Duplicated Code, with nearly 20 points on the scale used to rank the responses. In second place, with half the points, was Long Method, which is the smell we will discuss next.
Ideally, methods should be small, with self-explanatory names and few lines of code. Long Methods are considered a code smell, as they make the code harder to understand and maintain. When we find a long method, we should consider use an Extract Method to break it into smaller methods. However, there’s no maximum number of lines of code that can arbitrarily be used to identify long methods because this depends on the programming language, the relevance of the method, the system’s domain, among other factors. Nevertheless, nowadays, there’s a tendency to write small methods, with fewer than, for instance, 20 lines of code.
Like methods, classes shouldn’t assume multiple responsibilities and provide services that aren’t cohesive. Hence, Large Classes are considered a smell because, just like long methods, they make the code harder to understand and maintain. Normally, it’s also more challenging to reuse these classes in another package or system. Large classes are characterized by a large number of attributes with low cohesion among them. The solution to this smell involves using an Extract Class to derive a smaller class B from a larger class A. Thereafter, class A gets an attribute of type B.
In-Depth: When a class grows so much that it starts
to monopolize most of the system’s intelligence, it’s called a
God Class or a Blob. Classes with
generic names like Manager
, System
, or
Subsystem
tend to represent instances of this smell.
This smell corresponds to a method that seems to envy
the data
and methods of another class. In other words, it accesses more
attributes and methods of class B than of its current class A.
Therefore, we may consider using a Move Method to migrate it to the
envied
class.
The fireAreaInvalidated2
method, shown in the following
code, is an example of Feature Envy. It has three method calls, but all
having as the target the same abt
object of type
AbstractTool
. Besides, it accesses no attribute or calls
any method of its current class. Therefore, moving this method to
AbstractTool
should be considered.
public class DrawingEditorProxy
extends AbstractBean implements DrawingEditor {
...
void fireAreaInvalidated2 (AbstractTool abt , Double r ){
Point p1 = abt.getView().drawingToView (...);
Point p2 = abt.getView().drawingToView (...);
Rectangle r=new Rectangle(p1.x,p1.y,p2.x-p1.x p2.y-p1.y);
abt.fireAreaInvalidated (r);
}
...
}
In addition to being small, methods should, whenever possible, have few parameters. That is, a Long Parameters List is a smell that can be eliminated in two main ways. First, we should check whether one of the parameters can be obtained directly by the called method, as shown below:
p2 = p1.f();
g(p1, p2);
In this case, p2
is unnecessary, as it can be obtained
at the beginning of g
:
void g(p1) {
p2 = p1.f(); ...
}
Another possibility is to create a type grouping some of the parameters. For example, consider the following method:
void f(Date start, Date end) {
...
}
Thus, we can create a DateRange
class to represent a
range of dates:
class DateRange {
Date start;
Date end;
}
void f(DateRange range} {
...
}
As we studied in Chapter 5, global variables should be avoided, as they result in a poor type of coupling. For this reason, they are also considered a code smell. Essentially, Global variables make it more difficult to understand a function without looking at other parts of the system. To understand better, consider the following function:
void f(...) {
// computes a certain value x
return x + g; // where g is a global variable
}
Just from analyzing this code, can the reader tell what
f
returns? The answer is no, as we need to know
g
’s value. However, since g
is a global
variable, its value can be changed anywhere in the program, which can
easily introduce bugs in this function because a single, distant, and
unrelated line of code can influence f
’s result. All it
needs to do is to change the value of g
.
It’s also worth noting that in languages like Java, static attributes work exactly like global variables. Therefore, they also represent a code smell.
This code smell occurs when primitive types (that is,
int
, float
, String
, etc.) are
used instead of specific classes. For example, suppose we need to
declare a variable to store a ZIP code for an address. In a rush for
rapidly using the variable, we might declare it as a
String
, instead of creating a dedicated class—for instance,
ZIPCode
—for this purpose. The main advantage is that a
class can provide methods to manipulate the stored values. For example,
the class constructor can check if the informed ZIP code is valid before
initializing the object. This way, the class assumes this responsibility
and, consequently, frees the clients from this concern.
In short, we should not be obsessed
with primitive types.
Instead, we should consider the possibility of creating classes that
encapsulate primitive values and provide operations to handle them. In
the next smell, we’ll complement this recommendation and suggest that
such objects, whenever possible, should also be immutable.
In the second edition of his book on refactoring, Fowler considers Mutable Objects a code smell. A mutable object is one whose state can be modified. On the other hand, an immutable object, once created, can no longer be changed. To facilitate the creation of immutable objects, classes should declare all their attributes as private and final (a final attribute is read-only). The class should also be declared final to prohibit the creation of subclasses. Consequently, if we need to modify an immutable object, the alternative is to create a new instance with the desired state.
For example, String
objects in Java are immutable, as
illustrated by the following program.
class Main {
public static void main(String[] args) {
String s1 = "Hello World";
String s2 = s1.toUpperCase();
System.out.println(s1);
System.out.println(s2);
}
}
This program prints Hello World
and then HELLO WORLD
.
The reason is that the toUpperCase
method does not change
the s1
string, but returns a copy of it in uppercase.
Whenever possible, we should create immutable objects, as they can be
shared freely and safely with other functions. For example, in Java, you
can pass a string as a parameter to a function f
and be
sure that f
will not change its content. This would not
happen if strings were mutable because it would be possible for
f
to change the string. As a second advantage, immutable
objects are thread-safe, meaning there is no need to synchronize the
access of threads to their methods. The reason is simple: the classical
problems of concurrent systems, like race conditions, occur only when
multiple threads can change the state of an object. If the state is
immutable, such problems automatically disappear.
However, we should understand this code smell within the context of
the programming language used in a project. For example, when we use a
functional language, objects are immutable by definition, meaning, this
code smell will never appear. On the other hand, when we use an
imperative language, it is normal to have a certain number of mutable
objects. For this reason, in these languages, we should try to minimize
the number of such objects, without, however, assuming we will
completely eliminate them. For instance, we should make simple and small
objects immutable, like those representing ZIPCode
,
Currency
, Address
, Date
,
Time
, Phone
, Color
,
Email
, etc. To illustrate, we show next the implementation
of an immutable Date
class:
final public class Date { // final: cannot have subclasses
final private int day; // final: read-only attributes
final private int month;
final private int year;
private void check(int day, int month, int year)
throws InvalidDateException {
// verifies valid date
// if not valid, throws InvalidDateException
}
public Date(int day, int month, int year)
throws InvalidDateException {
check(day, month, year);
this.day = day;
this.month = month;
this.year = year;
}
// other methods
}
These classes only have attributes, and no methods. At most, they have getters and setters. However, as is recurrent with code smells, we should not consider any Data Class as a major concern. Instead, the important thing is to analyze the code and try to move behavior to these classes. That is, whenever it makes sense, we should add methods to these classes with operations that are already being done, but throughout other classes.
It is strange to see comments included in a list of code smells. For instance, in introductory programming courses, students are usually encouraged to comment on all the code they produce, with the aim of learning the importance of code documentation. In the book Elements of Programming Style, Brian Kerninghan—one of the creators of the Unix operating system and the C programming language—and P. J. Plauger offer an advice that clarifies the view of comments as smells. They recommend the following:
Don’t comment bad code, rewrite it.
Thus, the idea is that comments should not be used to explain bad code. Instead, we should refactor the code and, thereby, improve its readability. Having done that, there’s a good chance that the comments will no longer be necessary. An example are Long Methods like this one.
void f() {
// task1
...
// task2
...
// taskn
...
}
If we use Extract Method to extract the commented code, we’ll get the following improved version:
void task1 { ... }
void task2 { ... }
void taskn { ... }
void f {
task1();
task2();
...
taskn();
}
After this refactoring, the comments are no longer necessary, as the method names already reveal their purpose.
In-Depth: Technical debt is a term coined by Ward Cunningham in 1992 to designate technical problems that can hinder the maintenance and evolution of a system. Among others, these problems include lack of tests, architectural issues (for example, systems that are a Big Ball of Mud), systems with many code smells, or without any documentation at all. Cunningham’s intention was to create a term understood by managers and people without knowledge of software engineering principles and practices. Hence, he chose the term debt to reinforce that these issues, if not resolved, will require the payment of interest. This interest usually manifests in the form of inflexible and difficult-to-maintain systems, where bug fixes and the implementation of new features increasingly take more time and are more risky.
To illustrate the concept, suppose that adding a new feature F1 in a program would require three days. However, if there were no technical debt, F1 could be implemented in just two days. This one-day difference constitutes the interest charged by the technical debt present in our program. One alternative is to pay off the principal of this debt, that is, completely remove the technical debt. But that might take, for instance, four days. In other words, if we plan to evolve the program with F1 only, there’s no advantage. However, suppose that over the next months we will have to implement more features, such as F2, F3, F4, etc. In this case, eliminating the principal of the technical debt might pay off.
Martin Fowler. Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1st edition, 2000.
Martin Fowler. Refactoring: Improving the Design of Existing Code, Addison-Wesley, 2nd edition, 2018.
Danilo Silva, Nikolaos Tsantalis, Marco Tulio Valente. Why We Refactor? Confessions of GitHub Contributors. Foundations of Software Engineering, 2016.
1. Mark the FALSE alternative:
(a) Refactorings improve the design of a software system.
(b) Refactorings make the code of a system easier to understand.
(c) Refactorings facilitate the location and correction of future bugs.
(d) Refactorings speed up the implementation of new features.
(e) Refactorings improve a system’s runtime performance.
2. The following graph displays the number of features implemented in two similar systems (A and B), developed by comparable teams using the same technologies. In which system do you believe refactorings were systematically performed? Justify your answer.
3. Describe the differences between opportunistic and planned refactorings. Which of these types of refactoring should be more common?
4. Give the names of refactorings A and B that, if executed in sequence, do not change the system’s code. In other words, refactoring B undoes the transformations made by refactoring A.
5. In these examples, extract the lines commented with the word
extract
to a method g
.
class A {
void f() {
int x = 10
x++;
print x; // extract
}
}
class A {
void f() {
int x = 10
x++; // extract
print x; // extract
}
}
class A {
void f() {
int x = 10
x++; // extract
print x; // extract
int y = x+1;
...
}
}
class A {
void f() {
int x = 10
int y; // extract
y = h()*2; // extract
print y; // extract
int z = y+1;
...
}
}
6. The following function calculates the n-th term of the Fibonacci sequence. The first term is 0, the second term is 1, and thereafter, the n-th term is the sum of the preceding two terms.
int fib(int n) {
if (n == 1)
return 0;
if (n == 2)
return 1;
return fib(n-1) + fib(n-2);
}
Describe Types 1, 2, 3, and 4 clones for this function. It is not necessary to implement the clones, but be precise in your answer when indicating the differences between each clone and the original code.
7. Consider the following class that stores a specific value in dollars.
class Currency {
...
private double value = 0.0;
void add(Currency m) {
this.value = this.value + m.getValue();
}
...
}
(a) Why are the objects of this class mutable?
(b) Reimplement the class to ensure that its objects are now immutable.
8. As discussed in Section 9.5, comments that are used to explain
bad code
are a code smell. In these cases, the ideal is to clean
the code, and then, remove the comments. Next, we show another category
of comments that may be deleted. Explain why these comments are
unnecessary.
// Student class
class Student {
// Student's enrollment number
int enrollment;
// Student's date of birth
Date dateOfBirth;
// Student's address
Address address;
// Default constructor of the Student class
Student() {
...
}
...
}
9. Use an IDE to perform a simple refactoring on one of your programs. For example, perform a method renaming. What are the advantages of performing refactorings with the support of IDEs?
10. Use a Java IDE to perform the Move Class refactoring discussed in Section 9.4.1. Did the IDE report any error? If so, describe this error.