Code without tests is bad code. – Michael Feathers
This chapter begins with an introduction to testing, where we discuss the test pyramid and the main types of automated tests (Section 8.1). Then, we present the basic concepts of unit tests (Section 8.2), the principles for writing such tests (Section 8.3), test coverage (Section 8.4), the importance of having designs that promote testability (Section 8.5), and mock objects, which are used to enable the implementation of unit tests (Section 8.6). In Section 8.7, we present the concept of Test-Driven Development (TDD). Next, we tackle the tests at the top of the test pyramid, namely Integration Tests (Section 8.8) and End-to-End Tests (Section 8.9). To conclude the chapter, Section 8.10 provides a brief presentation of other types of tests, such as black-box and white-box tests, acceptance tests, and non-functional requirement tests.
Software is one of the most complex human constructs, as we discussed in Chapter 1. Thus, it is understandable that software systems are susceptible to various kinds of bugs and inconsistencies. To prevent such bugs from reaching customers and causing damage, it is crucial to embrace testing activities in software projects. In fact, testing is one of the most valued programming practices today across all types of software. It is also one of the practices that have undergone the most transformations in recent years.
In the case of Waterfall development, tests occurred in a separate phase, after the requirements, analysis, design, and implementation phases. Moreover, there was a separate test team responsible for verifying whether the implementation met the defined requirements. To check this, tests were often manual, i.e., a person used the system, provided some input, and checked if the outputs were as expected. Thus, the goal of such tests was mainly detecting bugs before the system went into production.
With agile methods, testing has been profoundly revised, as explained below:
A large part of test activities has been automated; in other words, in addition to implementing the system’s classes, developers now write code to test these classes. Thus, programs became self-testable.
Tests are no longer performed only after implementing the system’s classes. In fact, they can be implemented even before these classes.
Large test teams no longer exist—or they are responsible for specific tests. Instead, the developer who implements a class must also implement its tests.
Tests are no longer used only for detecting bugs. This is still important, but tests gained new roles, such as checking if a class continues to work after fixing a bug in another part of the system. Furthermore, tests also help in the documentation of the production code.
These transformations made testing one of the most valued programming practices in modern software development. It is in this context that we should understand Michael Feathers’ quote that opens this chapter: if the code does not have tests, it can be regarded as having low quality or even being legacy code.
In this chapter, we will focus on automated tests because manual tests are labor-intensive, slow, and expensive. Moreover, they must be repeated every time the system undergoes a change.
An interesting way to classify automated tests is through a test pyramid, originally proposed by Mike Cohn (link). As the next figure shows, this pyramid partitions the tests according to their granularity.
Particularly, tests are divided into three groups. Unit tests check small parts of the code, usually a single class (see also the next figures). They form the base of the pyramid, meaning most tests are in this category. Unit tests are simple, easier to implement, and fast to run. On the next level, we have integration tests or service tests that verify a system’s functionality or transaction. Thus, integration tests involve multiple classes from different packages and may include external components like databases. They require more time to implement and are slower to run. Lastly, at the top of the pyramid, we have end-to-end tests, also referred to as user interface tests or system tests. They simulate a user session on the system as authentically as possible. For this reason, they are more expensive, slower, and less numerous. End-to-end tests also tend to be fragile, meaning minor alterations in the user interface might demand changes in these tests.
A generic recommendation is that automated tests should be implemented in the following proportion: 70% as unit tests; 20% as integrations tests; and 10% as end-to-end tests (link, Chapter 3).
In this chapter, we will study the three types of tests included in the test pyramid. However, we’ll talk more about unit tests than the other tests, as they are far more common. Before we start, we would like to recall two concepts we introduced in Chapter 1. It is said that a piece of code has a defect—or a bug, more informally—when it does not comply with its specification. If a defective code is executed and causes the program to produce an incorrect result or behavior, we say that a failure has occurred.
Unit tests are automated tests of small units of code, typically classes, which are tested in isolation from the rest of the system. A unit test is a program that calls methods from a class and checks if they return the expected results. Thus, when using unit tests, the code can be divided into two parts: a set of classes—which implement the system’s requirements—and a set of tests, as illustrated in the next figure.
The figure shows a system with n classes and m tests. As can be observed, there isn’t a 1 to 1 correspondence between classes and tests. For instance, a class might have more than one test. This is the case for class C1, which is tested by T1 and T2. Probably, this occurs because C1 is an important class, which needs to be tested in different contexts. In contrast, C2 doesn’t have tests, whether because the developers forgot to implement them or because it’s a less important class.
Unit tests are implemented using frameworks built specifically for this purpose. The most well-known ones are called xUnit frameworks, where the x designates the language used in the implementation of the tests. The first of these frameworks, called sUnit, was implemented by Kent Beck in the late ’80s for Smalltalk. In this chapter, our tests are implemented in Java, using JUnit. The first version of JUnit was implemented by Kent Beck and Erich Gamma, in 1997, during a plane trip between Switzerland and the United States.
Today, there are versions of xUnit frameworks for the main programming languages. Therefore, one of the advantages of unit tests is that developers don’t need to learn a new programming language, as tests are implemented in the same language as the system under test.
To explain unit testing concepts, let’s use a Stack
class:
import java.util.ArrayList;
import java.util.EmptyStackException;
public class Stack<T> {
private ArrayList<T> elements = new ArrayList<T>();
private int size = 0;
public int size() {
return size;
}
public boolean isEmpty() {
return (size == 0);
}
public void push(T elem) {
elements.add(elem);
size++;
}
public T pop() throws EmptyStackException {
if (isEmpty())
throw new EmptyStackException();
T elem = elements.remove(size-1);
size--;
return elem;
}
}
JUnit allows implementing classes that will test application classes
like Stack
. By convention, test classes have the same name
as the tested classes, but with a Test
suffix. Therefore,
our first test class is called StackTest
. Meanwhile, test
methods start with the test
prefix and must meet the
following conditions: (1) they must be public since they are called by
JUnit; (2) they do not have parameters; and (3) they must have the
@Test
annotation, which identifies methods that should be
executed during a test.
Here is our first unit test:
import org.junit.Test;
import static org.junit.Assert.assertTrue;
public class StackTest {
@Test
public void testEmptyStack() {
Stack<Integer> stack = new Stack<Integer>();
boolean empty = stack.isEmpty();
assertTrue(empty);
}
}
In this first version, the StackTest
class has only one
test method, which is public, annotated with @Test
, and
named testEmptyStack()
. This method merely creates a stack
and tests if it’s empty.
Test methods have the following structure:
First, we should create the test context, also known as the
fixture. For that, we should instantiate the objects we
intend to test and, if necessary, initialize them. In our first example,
this part of the test only creates a Stack
.
Next, we should call one of the methods of the class being
tested. In this example, we call the isEmpty()
method and
store its result in a local variable.
Finally, we should test if the method’s result is as expected.
For that, a command called assert is used. In fact,
JUnit offers various variations of assert
, but all of them
have the same goal: to test if a particular result is equal to an
expected value. In the example, we use assertTrue
, which
checks if the value passed as a parameter is true.
IDEs offer options to run only the tests of a system, for example,
through a menu option called Run as Test.
In other words, if the
developer selects Run,
they will execute their program normally,
starting with the main
method. However, if they opt for the
Run as Test
option, they will not execute the program, but only
the tests.
The next figure shows the result of executing our first test. The result is displayed in the IDE itself, and the number of failures indicates that all tests passed. We can also observe that the test ran quickly, in 0.025 seconds.
However, suppose we made a error when implementing the
Stack
class. For example, suppose the size
attribute was initialized with the value 1 instead of zero. In this
case, the test fails, as indicated in the following screenshot.
The messages inform that there was a failure during the execution of
testEmptyStack
. Failure is the term used by JUnit to
indicate tests where the assert
command was not satisfied.
In another IDE window, we can find that the assertion responsible for
the failure is located on line 19 of the StackTest.java
file.
To conclude, let’s present the complete unit test code:
import org.junit.Test;
import org.junit.Before;
import static org.junit.Assert.assertTrue;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertEquals;
public class StackTest {
Stack<Integer> stack;
@Before
public void init() {
stack = new Stack<Integer>();
}
@Test
public void testEmptyStack() {
assertTrue(stack.isEmpty());
}
@Test
public void testNotEmptyStack() {
stack.push(10);
assertFalse(stack.isEmpty());
}
@Test
public void testSizeStack() {
stack.push(10);
stack.push(20);
stack.push(30);
int size = stack.size();
assertEquals(3,size);
}
@Test
public void testPushPopStack() {
stack.push(10);
stack.push(20);
stack.push(30);
int result = stack.pop();
result = stack.pop();
assertEquals(20,result);
}
@Test(expected = java.util.EmptyStackException.class)
public void testEmptyStackException() {
stack.push(10);
int result = stack.pop();
result = stack.pop();
}
}
The StackTest
class has five test methods, all annotated
with @Test
. There is also a method called
init()
, with a @Before
annotation. This method
is executed by JUnit before any test method. JUnit works in the
following way: for each test class, it calls each of its
@Test
methods. Each one executes on a different instance of
the test class. That is, before calling a @Test
method,
JUnit instantiates a fresh object of the tested class. If the class has
a @Before
method, it is executed before each
@Test
method. In the example, we used a
@Before
method to create the Stack
used by the
@Test
methods. Thus, we avoid repeating this code in each
test.
To make it a bit clearer, we show below the algorithm used by JUnit to execute unit tests:
for each test class TC
for each method m in TC with @Test annotation
o = new TC();
if C has a method b with @Before annotation
then o.b();
o.m();
Returning to the StackTest
class, another interesting
method is the one that tests the case where a pop()
throws
an EmptyStackException
. This test, which is the last one in
the code, doesn’t have an assert
. The reason is that an
assert
would end up being dead code in its implementation.
Calling a pop()
on an empty stack terminates the execution
with a EmptyStackException
. Consequently, the
assert
wouldn’t be executed. Therefore, the
@Test
annotation has a special attribute that specifies the
exception that should be raised by the test. In summary,
testEmptyStackException
passes if its execution raises an
EmptyStackException
. Otherwise, it fails.
Notice: JUnit has several versions. In this chapter, we are using version 4.12.
Before moving forward, let’s present some definitions:
Test: a method that implements a test. The term
is derived from the @Test
annotation. These methods are
also called test methods.
Fixture: the program state verified by one or
more test methods, including data, objects, etc. The term is reused from
the manufacturing industry, where a fixture is a piece of equipment that
fixes
a piece that you intend to build (see a photo on
Wikipedia). In the context of unit testing, the function of a fixture is
to fix
the state, i.e., the data and objects, verified by the
test.
Test Case: a class with test methods. The name
originates from the first versions of JUnit. In these versions, the test
methods were located in classes that inherited from a
TestCase
class.
Test Suite: a set of test cases, which are executed by the unit testing framework, which in our case is JUnit.
System Under Test (SUT): the system being tested. It’s a generic term, also used in other types of tests, not necessarily unit tests. Sometimes, the term production code is also used.
There are two main answers to this question. First, you can write the tests after implementing a small functionality. For example, you can implement some methods and then their tests, which should pass. In other words, you write a bit of code and test it, write more code and test it again, and so on.
Alternatively, you can write the tests first, before any production code. Initially, these tests will not pass. Thus, you start with code that only compiles and whose tests fail. Then you implement the production code and test again. Now, the tests should pass. This development style is called Test-Driven Development, and it will be discussed in Section 8.7.
However, there are two complementary answers to the question proposed in this section. First, when a user reports a bug, you can start its analysis by writing a test that reproduces the bug and that, therefore, fails. Next, you will correct the bug. If the correction is successful, the test will pass and you have gained an extra test for your suite.
Second, you can write tests when debugging a piece of code. For
example, avoid writing a System.out.println
to manually
test the result of a method. Instead, write a test method. When using a
println
command, it should be removed when the bug is
fixed. Meanwhile, a test can be added to your test suite and executed
periodically to avoid a reintroduction of the bug.
It’s also not advisable to leave the implementation of the tests for the end of the project or sprint, after having implemented all the features—as happens, for example, with Waterfall development. In such cases, the tests might be implemented in a hurry and with low quality. Or they might not be implemented, as the system is already working, and new features may be allocated to the development team. Finally, it’s not advisable that another team or even a third-party company implements the tests. Instead, we recommend that the developer who implemented a class should also implement its tests.
The main benefit of unit testing is detecting bugs before the code goes into production. When the system is under development, the costs to fix bugs are lower. Consequently, in systems with tests, it’s less likely that customers become surprised by bugs.
However, there are two other important benefits. First, unit tests act as a safety net against regressions. We say that a regression occurs when a modification is done in a part of the code—whether to fix a bug, implement a new feature, or perform a refactoring—but it ends up introducing a bug in another part. In other words, the code regressed because something that was working started to fail after the change. However, regressions are less common when there are good tests in place. For that, after completing a change, the developer should run the test suite. If the change introduces a regression, there’s a good chance it will be detected by the tests. In other words, before the change, the tests were passing, but after the change, they started to fail.
Furthermore, unit tests help with the documentation of the production
code. Indeed, by looking at StackTest
code, we can
understand various aspects of our Stack
class. Therefore,
many times, before maintaining a piece of code, a developer should start
by understanding the tests.
Real World: Among the programming practices proposed by agile methods, unit testing is probably the one that has had the greatest impact and is most widely used. Today, a variety of software systems, from companies of all sizes, are developed with the support of unit tests. Next, we highlight examples from two major software companies: Google and Facebook. The comments were extracted from articles that document the development process and practices of these companies:
Unit testing is strongly encouraged and widely practiced at
Google. All code used in production is expected to have unit tests, and
the code review tool will highlight if source files are added without
corresponding tests.
(link)
At Facebook, engineers conduct any unit tests for their newly
developed code. In addition, the code must pass all the accumulated
regression tests, which are administered automatically as part of the
commit and push process.
(link)
In this section, we describe best principles for implementing unit tests. The goal is to discuss how to implement tests that can be easily maintained and understood. Then, we also comment on things that should be avoided in the implementation of unit tests.
Unit tests must have the following characteristics (whose initials give rise to the acronym FIRST):
Fast: Developers must run unit tests frequently to receive feedback about bugs and regressions. Therefore, it’s important that these tests execute fast, for example, in milliseconds. If this is not the case, the test suite should be split into two groups: tests that run fast and therefore are frequently executed; and slower tests, which are executed, for instance, once a day.
Independent: The order of execution of unit tests does not matter. For any tests T1 and T2, running T1 followed by T2 must produce the same result as running T2 and then T1. Indeed, T1 and T2 can also be executed concurrently. For the tests to be independent, T1 should not change any part of the global state that is later used by T2, and vice versa.
Repeatable: Unit tests should always provide the same result. That is, if a test T is called n times, the result should be the same in all n executions. Therefore, either T passes in every execution, or it always fails. Tests with non-deterministic results are also called Flaky Tests or Erratic Tests. Concurrency is one of the main causes of flaky behavior. An example is shown in the following test:
@Test
public void exampleFlakyTest {
TaskResult result;
MyMath m = new MyMath();
m.asyncPI(10,result);
Thread.sleep(1000);
assertEquals(3.1415926535, result.get()); }
This test calls a function that calculates the value of PI, with a
certain accuracy. This function is asynchronous, that is, it runs in a
new thread. In this example, the required accuracy is 10 decimal places.
The test uses a sleep
to wait for the asynchronous function
to finish. However, this command turns the test non-deterministic: if
the function finishes before 1000 milliseconds, the test will pass; but
if execution takes longer, the test will fail. One possible solution is
to only test the synchronous implementation of the function. If this
implementation does not exist, a refactoring can be performed to extract
it from the asynchronous code. In Section 8.5.2, we will give an example
of such refactoring.
We might think that flaky tests are rare, but a study released by Google, covering their own tests, revealed that about 16% of them are subject to non-deterministic results (link). Consequently, these tests may fail not because a bug exists in the code, but due to non-deterministic events, such as a thread taking longer to execute. Flaky tests are problematic because they delay development: programmers spend time investigating the failure, only to find it’s a false alarm.
Self-checking: The result of unit tests should be
easily verifiable. Developers, for instance, should not have to open and
analyze an output file or manually provide input data to interpret the
test result. Instead, the results should be displayed in the IDE,
typically via components that turn green (to indicate all tests have
passed) or red (to signal that a test has failed). Additionally, when a
test fails, it should be possible to quickly identify the location of
the failed assert
command.
Timely: Tests should be written as early as possible, ideally even before the code that needs to be tested. This technique was briefly mentioned at the end of Section 8.2 and will be discussed more deeply in the section on Test-Driven Development (Section 8.7).
Test Smells represent suboptimal implementation decisions in test code, which, in principle, should be avoided. The name is an adaptation, for the context of testing, of the concept of Code Smells or Bad Smells, which we will study in Chapter 9. However, in this chapter, we will already comment on smells that can occur in test code.
An Obscure Test is a long, complex, and difficult-to-understand test. As we’ve mentioned, tests are also used to document the system-under-test. Therefore, it’s important that they follow clear and quickly understandable logic. Ideally, a test, for instance, should test a single requirement of the system-under-test.
Test with Conditional Logic includes code that may
not be executed. That is, tests with if
commands or loops
should be avoided, and the code of unit tests should ideally be
linear
. Conditional logic in tests is considered a smell because
it hinders understanding of the test.
Code Duplication in tests occurs when there are repeated blocks of code in several test methods.
However, these smells should not be taken literally, i.e., as a situation that needs to be avoided at all costs. Instead, they should be seen as a warning to test developers. When identifying a test smell, developers should consider whether it wouldn’t be possible to produce a simpler, shorter test, with linear code and without code duplication.
Lastly, just like with production code, tests should be frequently refactored to ensure they remain simple, easy to understand, and do not have smells.
Some authors (link) recommend
having at most one assert
per test. That is, they recommend
writing tests as follows:
@Test
public void testEmptyStack() {
assertTrue(stack.isEmpty());
}
@Test
public void testNotEmptyStack() {
stack.push(10);
assertFalse(stack.isEmpty());
}
In other words, they do not recommend using two assert
commands in the same test, as in the following code:
@Test
public void testEmptyStack() {
assertTrue(stack.isEmpty());
stack.push(10);
assertFalse(stack.isEmpty());
}
The first example, which breaks the empty stack test into two, tends to be more readable and easier to understand than the second one, which does everything in a single test. Furthermore, when the tests in the first example fail, it’s simpler to identify the reason for the failure than in the second example, which can fail for two reasons.
However, we should not be dogmatic in following this rule (link, Chapter 4).
The reason is that there are cases where it’s justified to have more
than one assert
per method. For example, suppose we need to
test a getBook
function that returns an object with the
title, author, year, and publisher of a book. In this case, it’s
justified to have four assert
commands in the test,
checking each of the fields of the returned object, as in the following
code.
@Test
public void testBookService() {
BookService bs = new BookService();
Book b = bs.getBook(1234);
assertEquals("Software Engineering", b.getTitle());
assertEquals("Marco Tulio Valente", b.getAuthor());
assertEquals("2024", b.getYear());
assertEquals("ASERG/DCC/UFMG", b.getPublisher());
}
A second exception is when we have a simple method that can be tested
using a single assert. To illustrate, we show the test of the
Strings.repeat
function provided by the
google/guava
library.
@Test
public void testRepeat() {
String input = "20";
assertEquals("", Strings.repeat(input,0));
assertEquals("20", Strings.repeat(input,1));
assertEquals("2020", Strings.repeat(input,2));
assertEquals("202020", Strings.repeat(input,3));
...
}
In this test, we have four assertEquals
commands which
test, respectively, the result of repeating a certain string zero, one,
two, and three times.
Test coverage is a metric that helps to determine the number of tests we need to write for a program. It measures the percentage of statements in a program covered by the existing tests, that is:
Test coverage = (Number of statements executed by the tests) / (Total number of statements in the program)
There are tools to calculate test coverage. The next figure shows an
example using the tool that comes with the Eclipse IDE. The lines with a
green background—as automatically highlighted by this tool—are the ones
covered by the tests in StackTest
. The only lines that are
not in green are the ones responsible for the method signatures and,
therefore, do not correspond to executable statements. The test coverage
of this example is 100% because the tests executed all statements in the
Stack
class.
Assume now that we did not implement
testEmptyStackException
. That is, we are not testing the
exception that pop()
raises when called with an empty
stack. In this case, the coverage drops to 92.9%, as shown in a next
figure.
In these figures, the green lines are the ones covered by the
execution of the tests. However, there is also a statement marked in
yellow. This color indicates that the command is a branch (in this case,
an if
) and that only one of the possible paths of the
branch (in this case, the false path) was exercised by the tests.
Lastly, there is a line in red. This color indicates lines not covered
by the tests.
In Java, the test coverage tools work by instrumenting the bytecode generated by the language compiler. As shown in the figure with coverage statistics, the previous program, after being compiled, has 52 instructions covered by the tests, out of a total of 56 instructions. Therefore, the test coverage is 52 / 56 = 92.9%.
There is no magic or absolute target number for test coverage. The recommended coverage varies from project to project, depending on the complexity of the requirements, the criticality of the project, etc. In general, it does not need to be 100%, as there are always trivial methods in a system, such as getters and setters. Also, we have methods whose testing is more challenging, like user interface methods or methods with asynchronous behavior.
Therefore, it is not recommended to set a coverage goal that must always be achieved. Instead, we should monitor the evolution of coverage results over time, to check whether developers, for example, are not becoming less committed to writing tests. It is also recommended to carefully assess the statements or methods that are not covered by the existing tests, to confirm that they are not relevant or are indeed more challenging to test.
Given these considerations, teams who value writing tests easily reach coverage close to 70% (link). On the other hand, values below 50% tend to raise concerns (link). Lastly, even when using TDD, test coverage usually does not reach 100%, although it is generally over 90% (link).
Real World: At a Google developers conference, in 2014, some stats on coverage measures of the company’s systems were presented (see the video). In the median, Google’s systems had 78% of coverage, in terms of statements. As mentioned in the presentation, the recommendation is to reach 85% in most systems, although this is not set in stone. It was also mentioned that coverage varies by programming language. The lowest coverage was for C++ projects, slightly below 60% in the average. The highest was measured for Python projects, slightly above 80%.
The definition of coverage, presented before and based on
statements, is the most common one. However, there are other
definitions, such as function coverage (percentage of
functions that are executed by the tests), function call
coverage (among all the statements in a program that call
functions, how many are exercised by the tests), branch
coverage (percentage of branches of a program that are executed
by the tests; an if
always generates two branches: when the
condition is true and when it is false). Command and branch coverages
are also called C0 Coverage and C1
Coverage, respectively. To illustrate the difference between
both, we will use the following class (first code) and its unit test
(second code):
public class Math {
public int abs(int x) {
if (x < 0) {
x = -x;
}
return x;
}
}
public class MathTest {
@Test
public void testAbs() {
Math m = new Math();
assertEquals(1,m.abs(-1));
}
}
Assuming statements coverage, we have 100% coverage. However,
assuming branch coverage, the value is 50% because we only tested one of
the conditions (the true condition) of the if (x < 0)
statement. To achieve 100% branch coverage, we need another
assert
, like: assertEquals(1, m.abs(1))
. Thus,
branch coverage is stricter than statement coverage.
Testability refers to how easy it is to test a program. As we have
seen, it is crucial that tests follow the FIRST principles, that they
have few assert
s, and achieve high coverage. However, the
design of the production code should also facilitate the implementation
of tests. This design property is called testability.
In other words, a significant part of the effort in writing good tests
should be allocated to the design of the system under test, not
specifically to the design of the tests.
The good news is that code that follows the design principles we discussed in Chapter 5—such as high cohesion, low coupling, single responsibility, separation between presentation and model, dependency inversion, Demeter, among others—tends to exhibit good testability.
A servlet is a Java package for implementing dynamic web pages. As an
example, we show next a servlet that calculates a person’s Body Mass
Index, given their weight and height. Our goal here is merely didactic.
Therefore, we will not explain the entire protocol for implementing
servlets. Moreover, the logic of this example is very simple, consisting
of the following formula: weight / (height * height)
. But
try to imagine that it can be more complex; even in this case, the
solution presented here will apply.
public class BMIServlet extends HttpServlet {
public void doGet(HttpServletRequest req,
HttpServletResponse res) {
res.setContentType("text/html");
PrintWriter out = res.getWriter();
String weight = req.getParameter("weight");
String height = req.getParameter("height");
try {
double w = Double.parseDouble(weight);
double h = Double.parseDouble(height);
double bmi = w / (h * h);
out.println("Body Mass Index (BMI): " + bmi);
}
catch (NumberFormatException e) {
out.println("Data must be numeric");
}
}
}
First, notice that it’s not simple to write a test for
BMIServlet
, as it depends on other types from Java’s
Servlet package. For example, it is not straightforward to instantiate a
BMIServlet
object and then call doGet
. If we
take this approach, we would also need to create
HttpServletRequest
and HttpServletResponse
objects to pass as parameters to doGet
. However, these
types might rely on other types, and so on. In summary, the testability
of BMIServlet
is low.
An alternative to testing this class is to extract the domain logic
to a separate class, as shown in the next code. This makes it easier to
test the new domain class, called BMIModel
, as it does not
depend on Servlet-related types. For example, it is now straightforward
to create a BMIModel
object. However, after this
refactoring we won’t be testing the complete code. But, it is better to
test the domain part of the program than to leave its entire code
uncovered by tests.
class BMIModel {
public double calculateBMI(String w1, String h1)
throws NumberFormatException {
double w = Double.parseDouble(w1);
double h = Double.parseDouble(h1);
return w / (h * h);
}
}
public class BMIServlet extends HttpServlet {
BMIModel model = new BMIModel();
public void doGet(HttpServletRequest req,
HttpServletResponse res) {
res.setContentType("text/html");
PrintWriter out = res.getWriter();
String weight = req.getParameter("weight");
String height = req.getParameter("height");
try {
double bmi = model.calculateBMI(weight, height);
out.println("Body Mass Index (BMI): " + bmi);
}
catch (NumberFormatException e) {
out.println("Data must be numeric");
}
}
}
Next, we show the implementation of the asyncPI
function
discussed in Section 8.3 when presenting the FIRST principles and,
specifically, the concept of repeatable tests. As we explained, it’s not
simple to test an asynchronous function, since its result is computed by
another thread. The test in Section 8.3 used a Thread.sleep
to wait for the result of asyncPI
. However, this command
makes the test non-deterministic (or flaky).
public class MyMath {
public void asyncPI(int prec, TaskResult task) {
new Thread (new Runnable() {
public void run() {
double pi = "calculates PI with precision prec"
task.setResult(pi);
}
}).start();
}
}
Next, we show a solution to improve the testability of this class.
First, we extract the code that computes the PI’s value into a separate
and synchronous function, called syncPI
. This way, only
this function will be tested by a unit test. In summary, the observation
we made earlier still holds: it’s better to extract a function that is
easy to test than to leave the whole code untested.
public class MyMath {
public double syncPI(int prec) {
double pi = "calculates PI with precision prec"
return pi;
}
public void asyncPI(int prec, TaskResult task) {
new Thread (new Runnable() {
public void run() {
double pi = syncPI(prec);
task.setResult(pi);
}
}).start();
}
}
To explain the role of mocks in unit tests, let’s start with a motivating example and discuss why it is difficult to write a unit test for it. Then, we will introduce the concept of mocks as a solution to test this example.
Notice: In this chapter, we are using the term mock with the same meaning as stub. We made this decision because it is followed by several testing tools. However, we include a subsection later to emphasize that some authors make a distinction between these terms.
Motivating Example: To illustrate the concept of
mocks, let’s start with a simple class for book searching, whose code is
shown below. This class, called BookSearch
, implements a
getBook
method that searches for books on a remote service.
This service, in turn, implements the BookService
interface. To make the example more realistic, let’s assume that
BookService
represents a REST API. Regardless of that, the
crucial point is that the search is conducted in another server,
abstracted by the BookService
interface. This server
returns its result as a JSON document, i.e., a text document.
Consequently, the getBook
method accesses the remote
server, retrieves the response in JSON format, and creates a
Book
object to store the search result. To keep the example
clear, we omit the code for the Book
class, but it has
fields containing data about books and their corresponding getters.
import org.json.JSONObject;
public class BookSearch {
BookService rbs;
public BookSearch(BookService rbs) {
this.rbs = rbs;
}
public Book getBook(int isbn) {
String json = rbs.search(isbn);
JSONObject obj = new JSONObject(json);
String title = (String) obj.get("title");
return new Book(title);
}
}
public interface BookService {
String search(int isbn);
}
Problem: We need to implement a unit test for
BookSearch
. However, by definition, a unit test exercises a
small component of the program, such as a single class. The problem is
that to test BookSearch
we need a BookService
,
which is an external service. That is, if we are not careful, the test
will reach an external service. This is problematic for two reasons: (1)
the scope of the test will be larger than a small unit of code; (2) the
test will be slower, since it is accessing a remote service, using a
network protocol. However, unit tests should be fast, as recommended by
the FIRST principles that we studied in Section 8.3.
Solution: One solution is to create an object that
emulates
the real object, but only for testing purposes. This
kind of object is called a mock (or stub). In our
example, the mock must implement the BookService
interface
and, therefore, the search
method. However, this
implementation is partial, as the mock just returns the titles of some
books without accessing remote servers. An example is shown below:
import static org.junit.Assert.*;
import org.junit.*;
import static org.junit.Assert.*;
class BookConst {
public static String SOFTENG =
"{ \"title\": \"Software Engineering\" }";
public static String NULLBOOK =
"{ \"title\": \"NULL\" }";
}
class MockBookService implements BookService {
public String search(int isbn) {
if (isbn == 1234)
return BookConst.SOFTENG;
return BookConst.NULLBOOK;
}
}
public class BookSearchTest {
private BookService service;
@Before
public void init() {
service = new MockBookService();
}
@Test
public void testGetBook() {
BookSearch bs = new BookSearch(service);
String title = bs.getBook(1234).getTitle();
assertEquals("Software Engineering", title);
}
}
In this example, MockBookService
is a class used to
create mocks of BookService
, i.e., objects that implement
this interface with a trivial behavior. Particularly, the mock object,
named service
, only returns data about the book with ISBN
1234. The purpose of this mock is to allow the implementation of a test
that does not access a remote and slow service. In the
testGetBook
method, we first use the mock to create an
object of type BookSearch
. Then, we call the
getBook
method to search for a book and return its title.
Finally, we execute an assertEquals
. As the test uses a
MockBookService
, it checks if the returned title is the
only one searched
by this mock.
However, one question remains: what does testGetBook
actually test? In other words, what requirements are being verified with
such a simple mock object? In this case, we are not testing access to
the remote service, as mentioned earlier. This requirement is too
complex for unit tests. Instead, we are just testing whether the logic
of creating a Book
object from a JSON document is working
as expected. In a more comprehensive test, we can include additional
fields in Book
besides the title. Additionally, we can test
with more books by extending the mock.
Mocks (or stubs) are so common in unit tests that there are frameworks that facilitate their creation. We won’t delve into the details of these frameworks, but we at least present the code of the previous test using a mock created by a popular framework called Mockito (link).
import org.junit.*;
import static org.junit.Assert.*;
import org.mockito.Mockito;
import static org.mockito.Mockito.when;
import static org.mockito.Matchers.anyInt;
public class BookSearchTest {
private BookService service;
@Before
public void init() {
service = Mockito.mock(BookService.class);
when(service.search(anyInt())).
thenReturn(BookConst.NULLBOOK);
when(service.search(1234)).thenReturn(BookConst.SOFTENG);
}
@Test
public void testGetBook() {
BookSearch bs = new BookSearch(service);
String title = bs.getBook(1234).getTitle();
assertEquals("Software Engineering", title);
}
}
First, we can see that there is no longer a
MockBookService
class. The main benefit of using a mock
framework is precisely this: no longer having to implement mocks
manually. Instead, the mock for BookService
is created by
the framework itself using the reflection features of Java. We just need
to call the Mockito.mock(type)
function, as follows:
service = Mockito.mock(BookService.class);
However, the mock service
is initially created without
any behavior. We then have to teach it to behave at least in some
situations. Specifically, we have to teach it to respond to some book
searches. For this, Mockito offers a simple domain-specific language,
based on Java syntax. An example is shown below:
when(service.search(anyInt())).thenReturn(BookConst.NULLBOOK);
when(service.search(1234)).thenReturn(BookConst.SOFTENG);
These two lines program
our mock. First, we command it to
return BookConst.NULLBOOK
when the search
method is called with any integer as an argument. Then, we open an
exception to this general rule: when search
is called with
the argument 1234, it should return the JSON string that describes the
SOFTENG
book.
Some authors, such as Gerard Meszaros (link), make a distinction between mocks and stubs. According to them, mocks emulate not only the state of the System Under Test (SUT) but also its behavior. When mocks only verify the state (as in our example), they should be called stubs, according to Meszaros. However, in this book, we will not make this distinction. We find it subtle, and therefore, the benefits do not outweigh the cost of extra paragraphs to explain similar concepts.
However, just to clarify a bit more, a behavioral test—also called an interaction test—checks for events (e.g., method calls) that occur during the execution of the tests. Here is an example:
void testBehaviour {
Mailer m = mock(Mailer.class);
sut.someBusinessLogic(m);
verify(m).send(anyString());
}
In this example, the verify
command, provided by
Mockito, is similar to an assert
. However, it checks if an
event occurred with the mock passed as an argument. In this case, we
verify if the mock’s send
method was executed at least
once, using any string as an argument.
Indeed, according to Meszaros, mocks and stubs are special cases of double objects. Besides mocks and stubs, there two other types of doubles:
Dummy Objects are passed as arguments to a method but they are not used in the method’s body. Thus, they are used only to bypass the language type system.
Fake Objects have a simpler implementation than a real object. For example, they can simulate a database in main memory.
In the previous section, we discussed the test of a servlet that calculates the Body Mass Index (BMI) of a person. We argued that testing this servlet is challenging due to its complex dependencies, which are difficult to recreate in a test. Now, however, we know that we can create mocks for these dependencies, i.e., objects that emulate the real dependencies but only respond to the calls needed in our test.
First, let’s reintroduce the servlet we want to test:
public class BMIServlet extends HttpServlet {
BMIModel model = new BMIModel();
public void doGet(HttpServletRequest req,
HttpServletResponse res) {
res.setContentType("text/html");
PrintWriter out = res.getWriter();
String weight = req.getParameter("weight");
String height = req.getParameter("height");
double bmi = model.calculateBMI(weight,height);
out.println("BMI: " + bmi);
}
}
And here is the new test for this servlet (it is an adaptation of an
example used in an article
by Dave Thomas and Andy Hunt). First, in the init
method,
we create mocks for the HttpServletRequest
and
HttpServletResponse
objects. These mocks are used as
parameters for the doGet
call made in the test method.
Still in init
, we create a StringWriter
object
that allows output in the form of a list of strings. Then, this object
is encapsulated by a PrintWriter
, which is the output
object used by the servlet—that is, this is an example of the Decorator
design pattern, which we studied in Chapter 6. Finally, we program the
response of the mock: when the servlet asks for an output object, by
calling getWriter()
, it should return the
PrintWriter
object we just created. In summary, we did all
these steps to change the servlet output to a list of strings.
public class BMIServletTest {
HttpServletRequest req;
HttpServletResponse res;
StringWriter sw;
@Before
public void init() {
req = Mockito.mock(HttpServletRequest.class);
res = Mockito.mock(HttpServletResponse.class);
sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw);
when(res.getWriter()).thenReturn(pw);
}
@Test
public void testDoGet() {
when(req.getParameter("weight")).thenReturn("82");
when(req.getParameter("height")).thenReturn("1.80");
new BMIServlet().doGet(req,res);
assertEquals("BMI: 25.3\n", sw.toString());
}
}
In the testDoGet
method, we begin by programming the
mock with the input parameters of the servlet. When the servlet requests
the weight
parameter, the mock returns 82; when it requests the
height
parameter, it returns 1.80. After that, the test follows
the typical flow of unit tests: we call the method we want to test,
doGet
, and check if it returns the expected result.
This example also illustrates the disadvantages of using mocks. The
primary drawback is that mocks increase the coupling between the test
and the SUT. Typically, in unit tests, the test calls the tested method
and checks its result. This way, the test doesn’t break when the
internal code of the tested method is modified. However, when using
mocks, this is no longer true, as the mock can depend on internal
structures or events of the tested method, making the tests fragile. For
instance, if the servlet’s output changes to Body Mass Index(BMI):
[value]
, we must update the assertEquals
in the unit
test.
Finally, it’s important to note that not all objects and methods can be mocked. Generally, the following structures cannot be mocked: final classes and methods, static methods, and constructors.
Test-Driven Development (TDD) is one of the programming practices proposed by Extreme Programming (XP). At first, the idea seems counterintuitive: given a unit test T for a class C, TDD argues that we should implement T before C. For this reason, this technique is also known as Test-First Development.
When we write the test first, it’s going to fail. Thus, in the workflow advocated by TDD, the next step is to write the code that makes this test pass, even if it’s initially just a trivial code. Then, this code should be completed and refined. Finally, if necessary, it should be refactored to improve its design, readability, maintainability, to follow design principles and patterns, etc.
TDD was proposed with three objectives in mind:
TDD prevents developers from forgetting to write tests. The reason is that TDD promotes testing as the first activity of any programming task, being it fixing a bug or implementing a new feature. Hence, it becomes more difficult to postpone the writing of tests to a later moment. Indeed, as we mentioned in Section 8.4, when using TDD, test coverage is usually greater than 90%.
TDD encourages writing code with high testability. This benefit is a natural consequence of the workflow inversion proposed by TDD: as developers have to write first the test T and then the class C, it is natural that they will design C to facilitate the writing of tests.
TDD is not only a testing but also a design practice. This happens because developers, by starting with the tests T, put themselves in the position of a user of the class C. In other words, with TDD, the first user of the class is its own developer—remember that T is a client of C since it calls methods from C. Therefore, it is expected that developers will define a simple interface for the class, using readable method names and avoiding many parameters, for example.
When working with TDD, developers should follow a cycle composed of three states, as shown in the next figure.
According to this diagram, the first goal is to reach the red state when the test is not yet passing. It may seem strange, but the red state is already a small victory: by writing a test that fails, developers have a specification of the class that they need to implement next. As we have mentioned, in this state, it is also important that the developers think about the interface of the class under test, putting themselves in the position of a user of this class. Lastly, it is important that the class compiles. For this, developers must define at least the name of the class and the signature of its methods.
Next, the goal is to reach the green state. To do this, developers
must implement the full functionality of the class under test and thus
the tests will start to pass. However, this implementation can be
performed in baby steps
. Perhaps, in the initial steps, the code
will be working partially, for example, returning only constants. This
process will become clearer in the example that we will present
soon.
Finally, developers should look for opportunities to refactor the class and the test. When using TDD, the goal is not just to reach the green state, when the program is working. In addition, developers should check the quality of this code. For example, they should check whether there is no duplicate code, whether there are large methods that can be broken into smaller ones, whether there are methods that can be moved to a different class, etc. After the refactoring step, we can finish or restart the cycle to implement another feature.
To conclude, let’s simulate a programming session using TDD. For
this, we will use a virtual bookstore system as an example. In this
system, we have a Book
class, with the attributes
title
, isbn
, and price
. And we
also have a ShoppingCart
class, which stores the books the
customer decided to buy. This class must have methods to add a book to
the cart, return the total price of the books in the cart, and remove a
book from the cart. Next, we describe the implementation of these
methods using TDD.
Red State: We start by defining that
ShoppingCart
has an add
and a
getTotal
method. Besides defining the names of these
methods, we define their parameters and write the first test:
@Test
void testAddGetTotal() {
Book b1 = new Book("book1", 10, "1");
Book b2 = new Book("book2", 20, "2");
ShoppingCart cart = new ShoppingCart();
cart.add(b1);
cart.add(b2);
assertEquals(30.0, cart.getTotal());
}
Despite simple and easy to understand, this test does not compile, as
there is no implementation for the Book
and
ShoppingCart
classes. Then, we have to provide that, as
shown next:
public class Book {
public String title;
public double price;
public String isbn;
public Book(String title, double price, String isbn) {
this.title = title;
this.price = price;
this.isbn = isbn;
}
}
public class ShoppingCart {
public ShoppingCart() {}
public void add(Book b) {}
double getTotal() {
return 0.0;
}
}
The implementation of both classes is very simple. We implemented
just the minimum for the program to compile. Note, for example, that
getTotal
just returns 0.0. Despite this, we achieved our
goal in the red state: we have now a test that compiles, runs, and
fails!
Green State: The previous test can be seen as a
specification for what we need to implement in
ShoppingCart
. So let’s do that:
public class ShoppingCart {
public ShoppingCart() {}
public void add(Book b) {}
double getTotal() {
return 30.0;
}
}
However, the reader must be surprised again: this implementation is
incorrect! The ShoppingCart
constructor is empty, the class
does not have attributes, getTotal
always returns 30.0,
etc. All of this is true, but we have achieved another small victory:
the test changed from red to green. So, it is passing. With TDD, the
improvements are always small. In XP’s vocabulary, they are called baby
steps.
However, we should continue and provide a more realistic
implementation for ShoppingCart
. Here it is:
public class ShoppingCart {
private ArrayList<Book> items;
private double total;
public ShoppingCart() {
items = new ArrayList<Book>();
total = 0.0;
}
public void add(Book b) {
items.add(b);
total += b.price;
}
double getTotal() {
return total;
}
}
Now we have a list to store the cart items, an attribute to store the
total value of the books, a constructor, an add
method that
adds the books to the list and increases the cart’s total
,
and so on. So, to the best of our understanding, this implementation
meets the class specification and thus we have reached the green
state.
Yellow State: Finally, we should look at the code
that was implemented and put into practice the properties, principles,
and design patterns we learned in the previous chapters. In other words:
is there anything we can do to make this code more readable, easy to
understand, and maintain? In this case, the idea that may arise is to
encapsulate the Book
fields. They are currently public, so
we can implement getter methods to access them. As this implementation
is simple, we won’t show the refactored code here.
At this point, we completed an iteration in the red-green-refactor TDD cycle. Now, we can stop, or think about implementing another requirement. For example, we can implement the method to remove books from the cart. For this, we should start another cycle.
With integration tests—also referred to as service tests—we move to an intermediate level of the testing pyramid (see a figure of this pyramid in the first section of the chapter). Thus, the objective shifts from testing a small unit of code, like a single class, to exercise a complete service, that is, a complete feature of the system. Therefore, integration tests involve more classes, sometimes from distinct packages. They also test dependencies and real systems, such as databases and remote services. Moreover, when implementing integration tests, we don’t use mocks. As these are larger tests, they take more time to run and, consequently, are executed less frequently.
Consider a simple app to add, remove, and edit appointments, as illustrated in the next figure. In this app, there is a class with methods to handle the appointments, as shown below:
public class AgendaFacade {
public AgendaFacade(DB db);
int addAppointment(Appointment p);
void removeAppointment(int id);
Appointment[] listAppointments();
}
Thus, we can write the following integration test for this class:
@Test
void AgendaFacadeTest() {
DB db = DB.create();
AgendaFacade agenda = new AgendaFacade(db);
Appointment app1 = new Appointment(...);
Appointment app2 = new Appointment(...);
Appointment app3 = new Appointment(...);
int id1 = agenda.addAppointment(app1);
int id2 = agenda.addAppointment(app2);
int id3 = agenda.addAppointment(app3);
Appointment [] apps = agenda.listAppointments();
assertEquals(3,apps.length);
}
It is worth mentioning two points about this test. First, it is
implemented using JUnit, like the previous unit tests we studied in this
chapter. That is, JUnit can be used for both unit and integration tests.
Second, since it is an integration test, the class is tested with real
dependencies, in this case, for a database. At the beginning of the
test, this database is created with all the tables empty. Then, three
appointments are saved and then retrieved from the database. Finally, an
assert
is called. Thus, this test exercises the main
methods of our app, except those related to its graphical interface.
End-to-end tests—also called system tests or interface tests—are positioned at the top of the testing pyramid. These are tests that simulate the use of a system by a real user. They are the most expensive tests, requiring more effort to implement and taking the longest to execute.
Selenium is a framework for automating the tests of web systems. The
framework allows the implementation of tests that act like robots
opening web pages, filling out forms, clicking buttons, checking
responses, etc. An example—extracted and adapted from the Selenium
documentation (link)—is
shown below. This code simulates a Firefox user making a Google search
for the word software
. The test prints out the title of the page
with the results of this search.
public class SeleniumExample {
public static void main(String[] args) {
// creates a driver to access a web server
WebDriver driver = new FirefoxDriver();
// instructs the driver to "navigate" on Google
driver.navigate().to("http://www.google.com");
// gets a data input field, named "q"
WebElement element = driver.findElement(By.name("q"));
// fills this field with the word "software"
element.sendKeys("software");
// submits the data
element.submit();
// waits for the response page to load (8s timeout)
(new WebDriverWait(driver,8)).
until(new ExpectedCondition<Boolean>() {
public Boolean apply(WebDriver d) {
return d.getTitle().toLowerCase()
.startsWith("software");
}
});
// result should be: "software - Google Search"
System.out.println("Page title is: "+driver.getTitle());
// closes the browser
driver.quit();
}
}
Interface tests are harder to write, at least compared to unit tests and even integration tests. For example, the Selenium API is more complex than that of JUnit. Also, the test must handle interface events, like timeouts that occur when a page takes longer than usual to load. Interface tests are also more fragile, meaning they can break due to minor changes in the interface. For example, if the name of the search field on Google’s main page changes, the previous test has to be updated. However, when compared to the alternative—conducting the test manually—they are still competitive and have their benefits.
When implementing a compiler, we can use both unit and integration tests. But end-to-end tests, in this case, tend to be conceptually simpler. The reason is that a compiler interface doesn’t have windows and pages with graphical elements. Instead, a compiler receives an input file and produces an output file. Thus, to implement end-to-end tests for a compiler C for language X, we should create a set of programs in X, exercising various aspects of this language. For each program P, we should define a set of input and output data. Preferably, the output should be in a simple format, like a list of strings. In this context, the end-to-end tests are as follows: first, call C to compile each program P; then, run P with the defined input and verify if the result is as expected. This script is an end-to-end test, as we are exercising all modules of the compiler.
When compared to unit tests, it is harder to locate the code responsible for a failure in end-to-end tests. For example, in the case of the compiler tests, we will receive an indication that a program is not executing correctly. However, it might be challenging to map this failure to the compiler function responsible for the buggy code.
Testing techniques can be classified as black-box or white-box. When using a black-box technique, tests are written considering only the interface of the code under test. For example, if the goal is to test a method as a black-box, the only available information is its name, parameters, return types, and exceptions. On the other hand, when using a white-box technique, the implementation of the tests considers information about the code and its structure. Black-box testing techniques are also referred to as functional tests, and white-box techniques are called structural tests.
However, it is not straightforward to classify unit tests into either of these categories. Indeed, the classification depends on how the tests are written. If the unit tests are written using information only about the interface of the methods under test, they are considered black-box. However, if their implementation considers information about test coverage, such as branches that are covered or not, then they are white-box tests. In summary, unit tests test a small and isolated unit of code. This unit can be tested in the form of a black-box (considering only its interface and specification) or in the form of a white-box (considering and taking advantage of its internal structure for implementing more effective tests).
A similar observation can be made about the relationship between TDD and black-box/white-box testing. To clarify this relationship, let’s reproduce the following comment from Kent Beck (source: Test-Driven Development Violates the Dichotomies of Testing, Three Rivers Institute, 2007):
Another misleading dichotomy is between black-box and white-box tests. Since TDD tests are written before the code they are to test, they are black-box tests. However, I commonly get the inspiration for the next test from looking at the code, a hallmark of white-box testing.
When adopting black-box testing, there are techniques to assist in the selection of the inputs that will be tested. For example, Equivalence Classes is a technique that recommends dividing the inputs of a program into sets of values that have the same chance of presenting a bug. These sets are called equivalence classes. For each equivalence class, we should test only one of its values, which can be selected randomly. Suppose a function to calculate the income tax amount to pay, for each salary range, as illustrated in the next table. Partitioning based on equivalence classes recommends testing this function with four salaries, one from each salary range.
Salary | Tax Rate |
---|---|
From 1,903.99 to 2,826.65 | 7.5% |
From 2,826.66 to 3,751.05 | 15% |
From 3,751.06 to 4,664.68 | 22.5% |
Above 4,664.68 | 27.5% |
Boundary Value Analysis is a complementary technique that recommends testing also with the boundary values of each equivalence class and with the values that precede or succeed such boundaries. The reason is that bugs are often caused by inappropriately handling boundary conditions. Thus, in our example, for the first salary range, we should also test with these values:
However, as the reader might be wondering, it is not always straightforward to define the equivalence classes for the input domain of a function. That is, not all functions are organized into well-defined input ranges like those in our example.
To conclude, we would like to comment that exhaustive testing, essentially, testing a program with all possible inputs, in practice, is impossible, even for small programs. For example, even a function with only two integer parameters can take years to be tested with all possible pairs of integers. Random tests, where the test data is chosen randomly, are also not sufficient, in most cases. The reason is that we may select values from the same equivalence class, which is redundant. Meanwhile, other equivalence classes might be left untested.
These are tests carried out by the customers, using their data. The results will determine whether or not the customers agree with the implemented software. If they agree, the system can be put into production. If they do not, the necessary adjustments need to be made. For example, when using agile methods, a user story is only considered finished after it passes the acceptance tests defined and conducted by the Product Owner.
Acceptance tests have two characteristics that distinguish them from the tests we’ve studied earlier in this chapter. First, they are usually manual tests, carried out by the customers or their representatives. Second, they are not exclusively a verification activity (as with the previous tests), but also a software validation activity. As we studied in Chapter 1, verification tests if we’ve implemented the software correctly, that is, in line with its specification. Meanwhile, validation tests if we’ve implemented the correct software, that is, the one requested and required by the customers.
Acceptance tests are commonly divided into two main types. Alpha tests are conducted with customers, but in a controlled environment, such as the developer’s machine. If the system passes such tests, a test with a larger customer group can be undertaken, this time no longer in a controlled environment. These tests are referred to as beta tests.
Previous testing strategies checked only functional requirements; therefore, their goal is to find bugs. However, it is also possible to perform tests to verify non-functional requirements. For example, there are tools that support the execution of performance tests, to check the system’s behavior under some load. An e-commerce company can use these tools to simulate the performance of their website during major events, like Black Friday, for instance. Usability tests are used to evaluate the system’s user interface and frequently involve the observation of real users using the system. Failure tests simulate abnormal events in a system, for example, the failure of some services or even an entire data center.
Gerard Meszaros. xUnit Test Patterns: Refactoring Test Code. Addison-Wesley, 2007.
Kent Beck, Erich Gamma. Test-infected: programmers love writing tests. Java Report, 3(7):37-50, 1998.
Kent Beck. Test-Driven Development: by Example, Addison-Wesley, 2002.
Dave Thomas and Andy Hunt. Mock Objects. IEEE Software, 2002
Maurício Aniche. Testes automatizados de software: um guia prático. Casa do Código, 2015.
Jeff Langr, Andy Hunt, Dave Thomas. Pragmatic Unit Testing in Java 8 with Junit. O’Reilly, 2015.
1. Describe three benefits associated with unit testing.
2. Suppose a function fib(n)
, which returns the n-th
term of the Fibonacci sequence, i.e., fib(0) = 0
,
fib(1) = 1
, fib(2) = 1
,
fib(3) = 2
, fib(4) = 3
, etc. Write a unit test
for this function.
3. Rewrite the following test, which checks the occurrence of an
EmptyStackException
, to make it simpler and easier to
understand.
@Test
public void testEmptyStackException() {
boolean success = false;
try {
Stack s<Integer> = new Stack<Integer>();
s.push(10);
int r = stack.pop();
r = stack.pop();
} catch (EmptyStackException e) {
success = true;
}
assertTrue(success);
}
4. Suppose a developer wrote the following test for the Java
ArrayList
class. As you’ll notice several
System.out.println
are used in this code. Thus, in essence,
it’s a manual test, as the developer has to manually check the results.
Rewrite each test (from 1 to 6) as a unit test.
import java.util.List;
import java.util.ArrayList;
public class Main {
public static void main(String[] args) {
// test 1
List<Integer> s = new ArrayList<Integer>();
System.out.println(s.isEmpty());
// test 2
s = new ArrayList<Integer>();
s.add(1);
System.out.println(s.isEmpty());
// test 3
s = new ArrayList<Integer>();
s.add(1);
s.add(2);
s.add(3);
System.out.println(s.size());
System.out.println(s.get(0));
System.out.println(s.get(1));
System.out.println(s.get(2));
// test 4
s = new ArrayList<Integer>();
s.add(1);
s.add(2);
s.add(3);
int elem = s.remove(2);
System.out.println(elem);
System.out.println(s.get(0));
System.out.println(s.get(1));
// test 5
s = new ArrayList<Integer>();
s.add(1);
s.remove(0);
System.out.println(s.size());
System.out.println(s.isEmpty());
// test 6
try {
s = new ArrayList<Integer>();
s.add(1);
s.add(2);
s.remove(2);
}
catch (IndexOutOfBoundsException e) {
System.out.println("IndexOutOfBound");
}
}
}
5. The following function has four statements, including two
if
, which, thus, generate four branches:
void f(int x, int y) {
if (x > 0) {
x = 2 * x;
if (y > 0) {
y = 2 * y;
}
}
}
With the previous observation in mind, fill the following table with the statement and branch coverage obtained from the tests specified in the first column. In other words, the first column defines the calls that are tested.
Test Call | Statements Coverage | Branch Coverage |
---|---|---|
f(0,0) |
||
f(1,1) |
||
f(0,0) and f(1,1) |
6. Students get an A
in a course if they score 90 or more.
Thus, consider the following function that checks this requirement:
boolean isScoreA(int grade) {
if (grade > 90)
return true;
else return false;
}
The implementation of this function has three statements, including
one if
, resulting in two branches. Now, answer the
following questions:
Does this function have a bug? If yes, when does it result in a failure?
Suppose the function is tested with two grades: 85 and 95. What is the statement coverage in this case? And the branch coverage?
Consider the following sentence: if a program has 100% coverage both at the statement and branch level, it is bug-free. Is it true or false? Justify your answer.
7. Complete the assert
commands in the indicated
sections.
public void test1() {
LinkedList list = mock(LinkedList.class);
when(list.size()).thenReturn(10);
assertEquals(___________, ___________);
}
public void test2() {
LinkedList list = mock(LinkedList.class);
when(list.get(0)).thenReturn("Software");
when(list.get(1)).thenReturn("Engineering");
String result = list.get(0) + " " + list.get(1);
assertEquals(___________, ___________);
}
8. Suppose two classes A and B, with A using B. To enable unit testing of A, a mock for B, called BMock, was created. The unit test of A is passing. However, the integration test of A and B is failing. Thus, describe a more realistic scenario, in which A, B, and BMock are classes with methods performing real functions. The proposed scenario should include a bug hidden by BMock. In other words, B has a bug that only appears in the integration test.