Book cover
All rights reserved. Version for personal use only.
This web version is subjected to minor edits. To report errors or typos, use this form.

Home | Dark Mode | Cite

Software Engineering: A Modern Approach

Marco Tulio Valente

8 Testing

Code without tests is bad code. – Michael Feathers

This chapter begins with an introduction to testing, where we discuss the test pyramid and the main types of automated tests (Section 8.1). Then, we present the basic concepts of unit tests (Section 8.2), the principles for writing such tests (Section 8.3), test coverage (Section 8.4), the importance of having designs that promote testability (Section 8.5), and mock objects, which are used to enable the implementation of unit tests (Section 8.6). In Section 8.7, we present the concept of Test-Driven Development (TDD). Next, we tackle the tests at the top of the test pyramid, namely Integration Tests (Section 8.8) and End-to-End Tests (Section 8.9). To conclude the chapter, Section 8.10 provides a brief presentation of other types of tests, such as black-box and white-box tests, acceptance tests, and non-functional requirement tests.

8.1 Introduction

Software is one of the most complex human constructs, as we discussed in Chapter 1. Thus, it is understandable that software systems are susceptible to various kinds of bugs and inconsistencies. To prevent such bugs from reaching customers and causing damage, it is crucial to embrace testing activities in software projects. In fact, testing is one of the most valued programming practices today across all types of software. It is also one of the practices that have undergone the most transformations in recent years.

In the case of Waterfall development, tests occurred in a separate phase, after the requirements, analysis, design, and implementation phases. Moreover, there was a separate test team responsible for verifying whether the implementation met the defined requirements. To check this, tests were often manual, i.e., a person used the system, provided some input, and checked if the outputs were as expected. Thus, the goal of such tests was mainly detecting bugs before the system went into production.

With agile methods, testing has been profoundly revised, as explained below:

  • A large part of test activities has been automated; in other words, in addition to implementing the system’s classes, developers now write code to test these classes. Thus, programs became self-testable.

  • Tests are no longer performed only after implementing the system’s classes. In fact, they can be implemented even before these classes.

  • Large test teams no longer exist—or they are responsible for specific tests. Instead, the developer who implements a class must also implement its tests.

  • Tests are no longer used only for detecting bugs. This is still important, but tests gained new roles, such as checking if a class continues to work after fixing a bug in another part of the system. Furthermore, tests also help in the documentation of the production code.

These transformations made testing one of the most valued programming practices in modern software development. It is in this context that we should understand Michael Feathers’ quote that opens this chapter: if the code does not have tests, it can be regarded as having low quality or even being legacy code.

In this chapter, we will focus on automated tests because manual tests are labor-intensive, slow, and expensive. Moreover, they must be repeated every time the system undergoes a change.

An interesting way to classify automated tests is through a test pyramid, originally proposed by Mike Cohn (link). As the next figure shows, this pyramid partitions the tests according to their granularity.

Test pyramid

Particularly, tests are divided into three groups. Unit tests check small parts of the code, usually a single class (see also the next figures). They form the base of the pyramid, meaning most tests are in this category. Unit tests are simple, easier to implement, and fast to run. On the next level, we have integration tests or service tests that verify a system’s functionality or transaction. Thus, integration tests involve multiple classes from different packages and may include external components like databases. They require more time to implement and are slower to run. Lastly, at the top of the pyramid, we have end-to-end tests, also referred to as user interface tests or system tests. They simulate a user session on the system as authentically as possible. For this reason, they are more expensive, slower, and less numerous. End-to-end tests also tend to be fragile, meaning minor alterations in the user interface might demand changes in these tests.

Unit test scope
Integration test scope
End-to-end test scope

A generic recommendation is that automated tests should be implemented in the following proportion: 70% as unit tests; 20% as integrations tests; and 10% as end-to-end tests (link, Chapter 3).

In this chapter, we will study the three types of tests included in the test pyramid. However, we’ll talk more about unit tests than the other tests, as they are far more common. Before we start, we would like to recall two concepts we introduced in Chapter 1. It is said that a piece of code has a defect—or a bug, more informally—when it does not comply with its specification. If a defective code is executed and causes the program to produce an incorrect result or behavior, we say that a failure has occurred.

8.2 Unit Testing

Unit tests are automated tests of small units of code, typically classes, which are tested in isolation from the rest of the system. A unit test is a program that calls methods from a class and checks if they return the expected results. Thus, when using unit tests, the code can be divided into two parts: a set of classes—which implement the system’s requirements—and a set of tests, as illustrated in the next figure.

Correspondence between classes and unit tests

The figure shows a system with n classes and m tests. As can be observed, there isn’t a 1 to 1 correspondence between classes and tests. For instance, a class might have more than one test. This is the case for class C1, which is tested by T1 and T2. Probably, this occurs because C1 is an important class, which needs to be tested in different contexts. In contrast, C2 doesn’t have tests, whether because the developers forgot to implement them or because it’s a less important class.

Unit tests are implemented using frameworks built specifically for this purpose. The most well-known ones are called xUnit frameworks, where the x designates the language used in the implementation of the tests. The first of these frameworks, called sUnit, was implemented by Kent Beck in the late ’80s for Smalltalk. In this chapter, our tests are implemented in Java, using JUnit. The first version of JUnit was implemented by Kent Beck and Erich Gamma, in 1997, during a plane trip between Switzerland and the United States.

Today, there are versions of xUnit frameworks for the main programming languages. Therefore, one of the advantages of unit tests is that developers don’t need to learn a new programming language, as tests are implemented in the same language as the system under test.

To explain unit testing concepts, let’s use a Stack class:

import java.util.ArrayList;
import java.util.EmptyStackException;

public class Stack<T> {

  private ArrayList<T> elements = new ArrayList<T>();
  private int size = 0;

  public int size() {
    return size;
  }

  public boolean isEmpty() {
    return (size == 0);
  }

  public void push(T elem) {
    elements.add(elem);
    size++;
  }

  public T pop() throws EmptyStackException {
    if (isEmpty())
       throw new EmptyStackException();
    T elem = elements.remove(size-1);
    size--;
    return elem;
  }
}

JUnit allows implementing classes that will test application classes like Stack. By convention, test classes have the same name as the tested classes, but with a Test suffix. Therefore, our first test class is called StackTest. Meanwhile, test methods start with the test prefix and must meet the following conditions: (1) they must be public since they are called by JUnit; (2) they do not have parameters; and (3) they must have the @Test annotation, which identifies methods that should be executed during a test.

Here is our first unit test:

import org.junit.Test;
import static org.junit.Assert.assertTrue;

public class StackTest {

  @Test
  public void testEmptyStack() {
    Stack<Integer> stack = new Stack<Integer>();
    boolean empty = stack.isEmpty();
    assertTrue(empty);
  }

}

In this first version, the StackTest class has only one test method, which is public, annotated with @Test, and named testEmptyStack(). This method merely creates a stack and tests if it’s empty.

Test methods have the following structure:

  • First, we should create the test context, also known as the fixture. For that, we should instantiate the objects we intend to test and, if necessary, initialize them. In our first example, this part of the test only creates a Stack.

  • Next, we should call one of the methods of the class being tested. In this example, we call the isEmpty() method and store its result in a local variable.

  • Finally, we should test if the method’s result is as expected. For that, a command called assert is used. In fact, JUnit offers various variations of assert, but all of them have the same goal: to test if a particular result is equal to an expected value. In the example, we use assertTrue, which checks if the value passed as a parameter is true.

IDEs offer options to run only the tests of a system, for example, through a menu option called Run as Test. In other words, if the developer selects Run, they will execute their program normally, starting with the main method. However, if they opt for the Run as Test option, they will not execute the program, but only the tests.

The next figure shows the result of executing our first test. The result is displayed in the IDE itself, and the number of failures indicates that all tests passed. We can also observe that the test ran quickly, in 0.025 seconds.

All tests passed

However, suppose we made a error when implementing the Stack class. For example, suppose the size attribute was initialized with the value 1 instead of zero. In this case, the test fails, as indicated in the following screenshot.

Failed test

The messages inform that there was a failure during the execution of testEmptyStack. Failure is the term used by JUnit to indicate tests where the assert command was not satisfied. In another IDE window, we can find that the assertion responsible for the failure is located on line 19 of the StackTest.java file.

Assertion responsible for the failure

To conclude, let’s present the complete unit test code:

import org.junit.Test;
import org.junit.Before;
import static org.junit.Assert.assertTrue;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertEquals;

public class StackTest {

  Stack<Integer> stack;

  @Before
  public void init() {
    stack = new Stack<Integer>();
  }

  @Test
  public void testEmptyStack() {
    assertTrue(stack.isEmpty());
  }

  @Test
  public void testNotEmptyStack() {
    stack.push(10);
    assertFalse(stack.isEmpty());
  }

  @Test
  public void testSizeStack() {
    stack.push(10);
    stack.push(20);
    stack.push(30);
    int size = stack.size();
    assertEquals(3,size);
  }

  @Test
  public void testPushPopStack() {
    stack.push(10);
    stack.push(20);
    stack.push(30);
    int result = stack.pop();
    result = stack.pop();
    assertEquals(20,result);
  }

  @Test(expected = java.util.EmptyStackException.class)
  public void testEmptyStackException() {
    stack.push(10);
    int result = stack.pop();
    result = stack.pop();
  }

}

The StackTest class has five test methods, all annotated with @Test. There is also a method called init(), with a @Before annotation. This method is executed by JUnit before any test method. JUnit works in the following way: for each test class, it calls each of its @Test methods. Each one executes on a different instance of the test class. That is, before calling a @Test method, JUnit instantiates a fresh object of the tested class. If the class has a @Before method, it is executed before each @Test method. In the example, we used a @Before method to create the Stack used by the @Test methods. Thus, we avoid repeating this code in each test.

To make it a bit clearer, we show below the algorithm used by JUnit to execute unit tests:

for each test class TC
  for each method m in TC with @Test annotation
    o = new TC();
    if C has a method b with @Before annotation
         then o.b();
    o.m();

Returning to the StackTest class, another interesting method is the one that tests the case where a pop() throws an EmptyStackException. This test, which is the last one in the code, doesn’t have an assert. The reason is that an assert would end up being dead code in its implementation. Calling a pop() on an empty stack terminates the execution with a EmptyStackException. Consequently, the assert wouldn’t be executed. Therefore, the @Test annotation has a special attribute that specifies the exception that should be raised by the test. In summary, testEmptyStackException passes if its execution raises an EmptyStackException. Otherwise, it fails.

Notice: JUnit has several versions. In this chapter, we are using version 4.12.

8.2.1 Definitions

Before moving forward, let’s present some definitions:

  • Test: a method that implements a test. The term is derived from the @Test annotation. These methods are also called test methods.

  • Fixture: the program state verified by one or more test methods, including data, objects, etc. The term is reused from the manufacturing industry, where a fixture is a piece of equipment that fixes a piece that you intend to build (see a photo on Wikipedia). In the context of unit testing, the function of a fixture is to fix the state, i.e., the data and objects, verified by the test.

  • Test Case: a class with test methods. The name originates from the first versions of JUnit. In these versions, the test methods were located in classes that inherited from a TestCase class.

  • Test Suite: a set of test cases, which are executed by the unit testing framework, which in our case is JUnit.

  • System Under Test (SUT): the system being tested. It’s a generic term, also used in other types of tests, not necessarily unit tests. Sometimes, the term production code is also used.

8.2.2 When to Write Unit Tests?

There are two main answers to this question. First, you can write the tests after implementing a small functionality. For example, you can implement some methods and then their tests, which should pass. In other words, you write a bit of code and test it, write more code and test it again, and so on.

Alternatively, you can write the tests first, before any production code. Initially, these tests will not pass. Thus, you start with code that only compiles and whose tests fail. Then you implement the production code and test again. Now, the tests should pass. This development style is called Test-Driven Development, and it will be discussed in Section 8.7.

However, there are two complementary answers to the question proposed in this section. First, when a user reports a bug, you can start its analysis by writing a test that reproduces the bug and that, therefore, fails. Next, you will correct the bug. If the correction is successful, the test will pass and you have gained an extra test for your suite.

Second, you can write tests when debugging a piece of code. For example, avoid writing a System.out.println to manually test the result of a method. Instead, write a test method. When using a println command, it should be removed when the bug is fixed. Meanwhile, a test can be added to your test suite and executed periodically to avoid a reintroduction of the bug.

It’s also not advisable to leave the implementation of the tests for the end of the project or sprint, after having implemented all the features—as happens, for example, with Waterfall development. In such cases, the tests might be implemented in a hurry and with low quality. Or they might not be implemented, as the system is already working, and new features may be allocated to the development team. Finally, it’s not advisable that another team or even a third-party company implements the tests. Instead, we recommend that the developer who implemented a class should also implement its tests.

8.2.3 Benefits

The main benefit of unit testing is detecting bugs before the code goes into production. When the system is under development, the costs to fix bugs are lower. Consequently, in systems with tests, it’s less likely that customers become surprised by bugs.

However, there are two other important benefits. First, unit tests act as a safety net against regressions. We say that a regression occurs when a modification is done in a part of the code—whether to fix a bug, implement a new feature, or perform a refactoring—but it ends up introducing a bug in another part. In other words, the code regressed because something that was working started to fail after the change. However, regressions are less common when there are good tests in place. For that, after completing a change, the developer should run the test suite. If the change introduces a regression, there’s a good chance it will be detected by the tests. In other words, before the change, the tests were passing, but after the change, they started to fail.

Furthermore, unit tests help with the documentation of the production code. Indeed, by looking at StackTest code, we can understand various aspects of our Stack class. Therefore, many times, before maintaining a piece of code, a developer should start by understanding the tests.

Real World: Among the programming practices proposed by agile methods, unit testing is probably the one that has had the greatest impact and is most widely used. Today, a variety of software systems, from companies of all sizes, are developed with the support of unit tests. Next, we highlight examples from two major software companies: Google and Facebook. The comments were extracted from articles that document the development process and practices of these companies:

  • Unit testing is strongly encouraged and widely practiced at Google. All code used in production is expected to have unit tests, and the code review tool will highlight if source files are added without corresponding tests. (link)

  • At Facebook, engineers conduct any unit tests for their newly developed code. In addition, the code must pass all the accumulated regression tests, which are administered automatically as part of the commit and push process. (link)

8.3 Principles and Smells

In this section, we describe best principles for implementing unit tests. The goal is to discuss how to implement tests that can be easily maintained and understood. Then, we also comment on things that should be avoided in the implementation of unit tests.

8.3.1 FIRST Principles

Unit tests must have the following characteristics (whose initials give rise to the acronym FIRST):

Fast: Developers must run unit tests frequently to receive feedback about bugs and regressions. Therefore, it’s important that these tests execute fast, for example, in milliseconds. If this is not the case, the test suite should be split into two groups: tests that run fast and therefore are frequently executed; and slower tests, which are executed, for instance, once a day.

Independent: The order of execution of unit tests does not matter. For any tests T1 and T2, running T1 followed by T2 must produce the same result as running T2 and then T1. Indeed, T1 and T2 can also be executed concurrently. For the tests to be independent, T1 should not change any part of the global state that is later used by T2, and vice versa.

Repeatable: Unit tests should always provide the same result. That is, if a test T is called n times, the result should be the same in all n executions. Therefore, either T passes in every execution, or it always fails. Tests with non-deterministic results are also called Flaky Tests or Erratic Tests. Concurrency is one of the main causes of flaky behavior. An example is shown in the following test:

@Test
public void exampleFlakyTest {
  TaskResult result;
  MyMath m = new MyMath();
  m.asyncPI(10,result);
  Thread.sleep(1000);
  assertEquals(3.1415926535, result.get());
}

This test calls a function that calculates the value of PI, with a certain accuracy. This function is asynchronous, that is, it runs in a new thread. In this example, the required accuracy is 10 decimal places. The test uses a sleep to wait for the asynchronous function to finish. However, this command turns the test non-deterministic: if the function finishes before 1000 milliseconds, the test will pass; but if execution takes longer, the test will fail. One possible solution is to only test the synchronous implementation of the function. If this implementation does not exist, a refactoring can be performed to extract it from the asynchronous code. In Section 8.5.2, we will give an example of such refactoring.

We might think that flaky tests are rare, but a study released by Google, covering their own tests, revealed that about 16% of them are subject to non-deterministic results (link). Consequently, these tests may fail not because a bug exists in the code, but due to non-deterministic events, such as a thread taking longer to execute. Flaky tests are problematic because they delay development: programmers spend time investigating the failure, only to find it’s a false alarm.

Self-checking: The result of unit tests should be easily verifiable. Developers, for instance, should not have to open and analyze an output file or manually provide input data to interpret the test result. Instead, the results should be displayed in the IDE, typically via components that turn green (to indicate all tests have passed) or red (to signal that a test has failed). Additionally, when a test fails, it should be possible to quickly identify the location of the failed assert command.

Timely: Tests should be written as early as possible, ideally even before the code that needs to be tested. This technique was briefly mentioned at the end of Section 8.2 and will be discussed more deeply in the section on Test-Driven Development (Section 8.7).

8.3.2 Test Smells

Test Smells represent suboptimal implementation decisions in test code, which, in principle, should be avoided. The name is an adaptation, for the context of testing, of the concept of Code Smells or Bad Smells, which we will study in Chapter 9. However, in this chapter, we will already comment on smells that can occur in test code.

An Obscure Test is a long, complex, and difficult-to-understand test. As we’ve mentioned, tests are also used to document the system-under-test. Therefore, it’s important that they follow clear and quickly understandable logic. Ideally, a test, for instance, should test a single requirement of the system-under-test.

Test with Conditional Logic includes code that may not be executed. That is, tests with if commands or loops should be avoided, and the code of unit tests should ideally be linear. Conditional logic in tests is considered a smell because it hinders understanding of the test.

Code Duplication in tests occurs when there are repeated blocks of code in several test methods.

However, these smells should not be taken literally, i.e., as a situation that needs to be avoided at all costs. Instead, they should be seen as a warning to test developers. When identifying a test smell, developers should consider whether it wouldn’t be possible to produce a simpler, shorter test, with linear code and without code duplication.

Lastly, just like with production code, tests should be frequently refactored to ensure they remain simple, easy to understand, and do not have smells.

8.3.3 Number of Asserts per Test

Some authors (link) recommend having at most one assert per test. That is, they recommend writing tests as follows:

@Test
public void testEmptyStack() {
  assertTrue(stack.isEmpty());
}

@Test
public void testNotEmptyStack() {
  stack.push(10);
  assertFalse(stack.isEmpty());
}

In other words, they do not recommend using two assert commands in the same test, as in the following code:

@Test
public void testEmptyStack() {
  assertTrue(stack.isEmpty());
  stack.push(10);
  assertFalse(stack.isEmpty());
}

The first example, which breaks the empty stack test into two, tends to be more readable and easier to understand than the second one, which does everything in a single test. Furthermore, when the tests in the first example fail, it’s simpler to identify the reason for the failure than in the second example, which can fail for two reasons.

However, we should not be dogmatic in following this rule (link, Chapter 4). The reason is that there are cases where it’s justified to have more than one assert per method. For example, suppose we need to test a getBook function that returns an object with the title, author, year, and publisher of a book. In this case, it’s justified to have four assert commands in the test, checking each of the fields of the returned object, as in the following code.

@Test
public void testBookService() {
  BookService bs = new BookService();
  Book b = bs.getBook(1234);
  assertEquals("Software Engineering", b.getTitle());
  assertEquals("Marco Tulio Valente", b.getAuthor());
  assertEquals("2024", b.getYear());
  assertEquals("ASERG/DCC/UFMG", b.getPublisher());
}

A second exception is when we have a simple method that can be tested using a single assert. To illustrate, we show the test of the Strings.repeat function provided by the google/guava library.

@Test
public void testRepeat() {
  String input = "20";
  assertEquals("", Strings.repeat(input,0));
  assertEquals("20", Strings.repeat(input,1));
  assertEquals("2020", Strings.repeat(input,2));
  assertEquals("202020", Strings.repeat(input,3));
  ...
}

In this test, we have four assertEquals commands which test, respectively, the result of repeating a certain string zero, one, two, and three times.

8.4 Test Coverage

Test coverage is a metric that helps to determine the number of tests we need to write for a program. It measures the percentage of statements in a program covered by the existing tests, that is:

Test coverage = (Number of statements executed by the tests) / (Total number of statements in the program)

There are tools to calculate test coverage. The next figure shows an example using the tool that comes with the Eclipse IDE. The lines with a green background—as automatically highlighted by this tool—are the ones covered by the tests in StackTest. The only lines that are not in green are the ones responsible for the method signatures and, therefore, do not correspond to executable statements. The test coverage of this example is 100% because the tests executed all statements in the Stack class.

Tests achieving 100% statements coverage

Assume now that we did not implement testEmptyStackException. That is, we are not testing the exception that pop() raises when called with an empty stack. In this case, the coverage drops to 92.9%, as shown in a next figure.

Tests achieving 92.9% statements coverage
if statement was not completely covered by the tests

In these figures, the green lines are the ones covered by the execution of the tests. However, there is also a statement marked in yellow. This color indicates that the command is a branch (in this case, an if) and that only one of the possible paths of the branch (in this case, the false path) was exercised by the tests. Lastly, there is a line in red. This color indicates lines not covered by the tests.

In Java, the test coverage tools work by instrumenting the bytecode generated by the language compiler. As shown in the figure with coverage statistics, the previous program, after being compiled, has 52 instructions covered by the tests, out of a total of 56 instructions. Therefore, the test coverage is 52 / 56 = 92.9%.

8.4.1 What is the Ideal Test Coverage?

There is no magic or absolute target number for test coverage. The recommended coverage varies from project to project, depending on the complexity of the requirements, the criticality of the project, etc. In general, it does not need to be 100%, as there are always trivial methods in a system, such as getters and setters. Also, we have methods whose testing is more challenging, like user interface methods or methods with asynchronous behavior.

Therefore, it is not recommended to set a coverage goal that must always be achieved. Instead, we should monitor the evolution of coverage results over time, to check whether developers, for example, are not becoming less committed to writing tests. It is also recommended to carefully assess the statements or methods that are not covered by the existing tests, to confirm that they are not relevant or are indeed more challenging to test.

Given these considerations, teams who value writing tests easily reach coverage close to 70% (link). On the other hand, values below 50% tend to raise concerns (link). Lastly, even when using TDD, test coverage usually does not reach 100%, although it is generally over 90% (link).

Real World: At a Google developers conference, in 2014, some stats on coverage measures of the company’s systems were presented (see the video). In the median, Google’s systems had 78% of coverage, in terms of statements. As mentioned in the presentation, the recommendation is to reach 85% in most systems, although this is not set in stone. It was also mentioned that coverage varies by programming language. The lowest coverage was for C++ projects, slightly below 60% in the average. The highest was measured for Python projects, slightly above 80%.

8.4.2 Other Definitions of Coverage

The definition of coverage, presented before and based on statements, is the most common one. However, there are other definitions, such as function coverage (percentage of functions that are executed by the tests), function call coverage (among all the statements in a program that call functions, how many are exercised by the tests), branch coverage (percentage of branches of a program that are executed by the tests; an if always generates two branches: when the condition is true and when it is false). Command and branch coverages are also called C0 Coverage and C1 Coverage, respectively. To illustrate the difference between both, we will use the following class (first code) and its unit test (second code):

public class Math {

  public int abs(int x) {
    if (x < 0) {  
      x = -x;
    }  
    return x;
  }

}
public class MathTest {

  @Test
  public void testAbs() {
    Math m = new Math();
    assertEquals(1,m.abs(-1));
  }

}

Assuming statements coverage, we have 100% coverage. However, assuming branch coverage, the value is 50% because we only tested one of the conditions (the true condition) of the if (x < 0) statement. To achieve 100% branch coverage, we need another assert, like: assertEquals(1, m.abs(1)). Thus, branch coverage is stricter than statement coverage.

8.5 Testability

Testability refers to how easy it is to test a program. As we have seen, it is crucial that tests follow the FIRST principles, that they have few asserts, and achieve high coverage. However, the design of the production code should also facilitate the implementation of tests. This design property is called testability. In other words, a significant part of the effort in writing good tests should be allocated to the design of the system under test, not specifically to the design of the tests.

The good news is that code that follows the design principles we discussed in Chapter 5—such as high cohesion, low coupling, single responsibility, separation between presentation and model, dependency inversion, Demeter, among others—tends to exhibit good testability.

8.5.1 Example: Servlet

A servlet is a Java package for implementing dynamic web pages. As an example, we show next a servlet that calculates a person’s Body Mass Index, given their weight and height. Our goal here is merely didactic. Therefore, we will not explain the entire protocol for implementing servlets. Moreover, the logic of this example is very simple, consisting of the following formula: weight / (height * height). But try to imagine that it can be more complex; even in this case, the solution presented here will apply.

public class BMIServlet extends HttpServlet {

  public void doGet(HttpServletRequest req, 
                    HttpServletResponse res) {
    res.setContentType("text/html");
    PrintWriter out = res.getWriter();
    String weight = req.getParameter("weight");
    String height = req.getParameter("height");
      try {
        double w = Double.parseDouble(weight);
        double h = Double.parseDouble(height);
        double bmi = w / (h * h);
        out.println("Body Mass Index (BMI): " + bmi);
      }
      catch (NumberFormatException e) {
        out.println("Data must be numeric");
      }
  }
}  

First, notice that it’s not simple to write a test for BMIServlet, as it depends on other types from Java’s Servlet package. For example, it is not straightforward to instantiate a BMIServlet object and then call doGet. If we take this approach, we would also need to create HttpServletRequest and HttpServletResponse objects to pass as parameters to doGet. However, these types might rely on other types, and so on. In summary, the testability of BMIServlet is low.

An alternative to testing this class is to extract the domain logic to a separate class, as shown in the next code. This makes it easier to test the new domain class, called BMIModel, as it does not depend on Servlet-related types. For example, it is now straightforward to create a BMIModel object. However, after this refactoring we won’t be testing the complete code. But, it is better to test the domain part of the program than to leave its entire code uncovered by tests.

class BMIModel {
  public double calculateBMI(String w1, String h1) 
                throws NumberFormatException {
    double w = Double.parseDouble(w1);
    double h = Double.parseDouble(h1);
    return w / (h * h);
  }
}

public class BMIServlet extends HttpServlet {
  BMIModel model = new BMIModel();

  public void doGet(HttpServletRequest req, 
                    HttpServletResponse res) {
    res.setContentType("text/html");
    PrintWriter out = res.getWriter();
    String weight = req.getParameter("weight");
    String height = req.getParameter("height");
    try {
      double bmi = model.calculateBMI(weight, height);
      out.println("Body Mass Index (BMI): " + bmi);
    }
    catch (NumberFormatException e) {
      out.println("Data must be numeric");
    }
  }
}  

8.5.2 Example: Asynchronous Call

Next, we show the implementation of the asyncPI function discussed in Section 8.3 when presenting the FIRST principles and, specifically, the concept of repeatable tests. As we explained, it’s not simple to test an asynchronous function, since its result is computed by another thread. The test in Section 8.3 used a Thread.sleep to wait for the result of asyncPI. However, this command makes the test non-deterministic (or flaky).

public class MyMath {

  public void asyncPI(int prec, TaskResult task) {
    new Thread (new Runnable() {
      public void run() {
        double pi = "calculates PI with precision prec"
        task.setResult(pi);
      }
    }).start();
  }

} 

Next, we show a solution to improve the testability of this class. First, we extract the code that computes the PI’s value into a separate and synchronous function, called syncPI. This way, only this function will be tested by a unit test. In summary, the observation we made earlier still holds: it’s better to extract a function that is easy to test than to leave the whole code untested.

public class MyMath {
  public double syncPI(int prec) {
    double pi = "calculates PI with precision prec"
    return pi;
  }

  public void asyncPI(int prec, TaskResult task) {
    new Thread (new Runnable() {
      public void run() {
        double pi = syncPI(prec);
        task.setResult(pi);
      }    
    }).start();
  }
}  

8.6 Mocks

To explain the role of mocks in unit tests, let’s start with a motivating example and discuss why it is difficult to write a unit test for it. Then, we will introduce the concept of mocks as a solution to test this example.

Notice: In this chapter, we are using the term mock with the same meaning as stub. We made this decision because it is followed by several testing tools. However, we include a subsection later to emphasize that some authors make a distinction between these terms.

Motivating Example: To illustrate the concept of mocks, let’s start with a simple class for book searching, whose code is shown below. This class, called BookSearch, implements a getBook method that searches for books on a remote service. This service, in turn, implements the BookService interface. To make the example more realistic, let’s assume that BookService represents a REST API. Regardless of that, the crucial point is that the search is conducted in another server, abstracted by the BookService interface. This server returns its result as a JSON document, i.e., a text document. Consequently, the getBook method accesses the remote server, retrieves the response in JSON format, and creates a Book object to store the search result. To keep the example clear, we omit the code for the Book class, but it has fields containing data about books and their corresponding getters.

import org.json.JSONObject;

public class BookSearch {

  BookService rbs;

  public BookSearch(BookService rbs) {
    this.rbs = rbs;
  }

  public Book getBook(int isbn) {
    String json = rbs.search(isbn);
    JSONObject obj = new JSONObject(json);
    String title = (String) obj.get("title");
    return new Book(title);
  }

}

public interface BookService {
  String search(int isbn);
}

Problem: We need to implement a unit test for BookSearch. However, by definition, a unit test exercises a small component of the program, such as a single class. The problem is that to test BookSearch we need a BookService, which is an external service. That is, if we are not careful, the test will reach an external service. This is problematic for two reasons: (1) the scope of the test will be larger than a small unit of code; (2) the test will be slower, since it is accessing a remote service, using a network protocol. However, unit tests should be fast, as recommended by the FIRST principles that we studied in Section 8.3.

Solution: One solution is to create an object that emulates the real object, but only for testing purposes. This kind of object is called a mock (or stub). In our example, the mock must implement the BookService interface and, therefore, the search method. However, this implementation is partial, as the mock just returns the titles of some books without accessing remote servers. An example is shown below:

import static org.junit.Assert.*;
import org.junit.*;
import static org.junit.Assert.*;

class BookConst {

  public static String SOFTENG = 
          "{ \"title\": \"Software Engineering\" }";

  public static String NULLBOOK = 
          "{ \"title\": \"NULL\" }";

}

class MockBookService implements BookService {

   public String search(int isbn) {
      if (isbn == 1234)
        return BookConst.SOFTENG;
      return BookConst.NULLBOOK;
   }

}

public class BookSearchTest {

  private BookService service;

  @Before
  public void init() {
    service = new MockBookService();
  }

  @Test
  public void testGetBook() {
    BookSearch bs = new BookSearch(service);
    String title = bs.getBook(1234).getTitle();
    assertEquals("Software Engineering", title);
  }

}

In this example, MockBookService is a class used to create mocks of BookService, i.e., objects that implement this interface with a trivial behavior. Particularly, the mock object, named service, only returns data about the book with ISBN 1234. The purpose of this mock is to allow the implementation of a test that does not access a remote and slow service. In the testGetBook method, we first use the mock to create an object of type BookSearch. Then, we call the getBook method to search for a book and return its title. Finally, we execute an assertEquals. As the test uses a MockBookService, it checks if the returned title is the only one searched by this mock.

However, one question remains: what does testGetBook actually test? In other words, what requirements are being verified with such a simple mock object? In this case, we are not testing access to the remote service, as mentioned earlier. This requirement is too complex for unit tests. Instead, we are just testing whether the logic of creating a Book object from a JSON document is working as expected. In a more comprehensive test, we can include additional fields in Book besides the title. Additionally, we can test with more books by extending the mock.

8.6.1 Mock Frameworks

Mocks (or stubs) are so common in unit tests that there are frameworks that facilitate their creation. We won’t delve into the details of these frameworks, but we at least present the code of the previous test using a mock created by a popular framework called Mockito (link).

import org.junit.*;
import static org.junit.Assert.*;
import org.mockito.Mockito;
import static org.mockito.Mockito.when;
import static org.mockito.Matchers.anyInt;

public class BookSearchTest {

  private BookService service;

  @Before
  public void init() {
    service = Mockito.mock(BookService.class);
    when(service.search(anyInt())).
                 thenReturn(BookConst.NULLBOOK);
    when(service.search(1234)).thenReturn(BookConst.SOFTENG);
  }        


  @Test
  public void testGetBook() {
    BookSearch bs = new BookSearch(service);
    String title = bs.getBook(1234).getTitle();
    assertEquals("Software Engineering", title);
  }

}

First, we can see that there is no longer a MockBookService class. The main benefit of using a mock framework is precisely this: no longer having to implement mocks manually. Instead, the mock for BookService is created by the framework itself using the reflection features of Java. We just need to call the Mockito.mock(type) function, as follows:

service = Mockito.mock(BookService.class);

However, the mock service is initially created without any behavior. We then have to teach it to behave at least in some situations. Specifically, we have to teach it to respond to some book searches. For this, Mockito offers a simple domain-specific language, based on Java syntax. An example is shown below:

when(service.search(anyInt())).thenReturn(BookConst.NULLBOOK);

when(service.search(1234)).thenReturn(BookConst.SOFTENG);

These two lines program our mock. First, we command it to return BookConst.NULLBOOK when the search method is called with any integer as an argument. Then, we open an exception to this general rule: when search is called with the argument 1234, it should return the JSON string that describes the SOFTENG book.

8.6.2 Mocks vs Stubs

Some authors, such as Gerard Meszaros (link), make a distinction between mocks and stubs. According to them, mocks emulate not only the state of the System Under Test (SUT) but also its behavior. When mocks only verify the state (as in our example), they should be called stubs, according to Meszaros. However, in this book, we will not make this distinction. We find it subtle, and therefore, the benefits do not outweigh the cost of extra paragraphs to explain similar concepts.

However, just to clarify a bit more, a behavioral test—also called an interaction test—checks for events (e.g., method calls) that occur during the execution of the tests. Here is an example:

void testBehaviour {
  Mailer m = mock(Mailer.class);
  sut.someBusinessLogic(m);
  verify(m).send(anyString());
}

In this example, the verify command, provided by Mockito, is similar to an assert. However, it checks if an event occurred with the mock passed as an argument. In this case, we verify if the mock’s send method was executed at least once, using any string as an argument.

Indeed, according to Meszaros, mocks and stubs are special cases of double objects. Besides mocks and stubs, there two other types of doubles:

  • Dummy Objects are passed as arguments to a method but they are not used in the method’s body. Thus, they are used only to bypass the language type system.

  • Fake Objects have a simpler implementation than a real object. For example, they can simulate a database in main memory.

8.6.3 Example: Servlet

In the previous section, we discussed the test of a servlet that calculates the Body Mass Index (BMI) of a person. We argued that testing this servlet is challenging due to its complex dependencies, which are difficult to recreate in a test. Now, however, we know that we can create mocks for these dependencies, i.e., objects that emulate the real dependencies but only respond to the calls needed in our test.

First, let’s reintroduce the servlet we want to test:

public class BMIServlet extends HttpServlet {

  BMIModel model = new BMIModel();

  public void doGet(HttpServletRequest req, 
                    HttpServletResponse res) {
    res.setContentType("text/html");
    PrintWriter out = res.getWriter();
    String weight = req.getParameter("weight");
    String height = req.getParameter("height");
    double bmi = model.calculateBMI(weight,height);
    out.println("BMI: " + bmi);
  }

}

And here is the new test for this servlet (it is an adaptation of an example used in an article by Dave Thomas and Andy Hunt). First, in the init method, we create mocks for the HttpServletRequest and HttpServletResponse objects. These mocks are used as parameters for the doGet call made in the test method. Still in init, we create a StringWriter object that allows output in the form of a list of strings. Then, this object is encapsulated by a PrintWriter, which is the output object used by the servlet—that is, this is an example of the Decorator design pattern, which we studied in Chapter 6. Finally, we program the response of the mock: when the servlet asks for an output object, by calling getWriter(), it should return the PrintWriter object we just created. In summary, we did all these steps to change the servlet output to a list of strings.

public class BMIServletTest {

  HttpServletRequest req;
  HttpServletResponse res;

  StringWriter sw;

  @Before
  public void init() {
    req = Mockito.mock(HttpServletRequest.class);
    res = Mockito.mock(HttpServletResponse.class);
    sw = new StringWriter();
    PrintWriter pw = new PrintWriter(sw);
    when(res.getWriter()).thenReturn(pw);
  }
  
  @Test
  public void testDoGet() {
    when(req.getParameter("weight")).thenReturn("82");
    when(req.getParameter("height")).thenReturn("1.80");
    new BMIServlet().doGet(req,res);
    assertEquals("BMI: 25.3\n", sw.toString());
  }

}

In the testDoGet method, we begin by programming the mock with the input parameters of the servlet. When the servlet requests the weight parameter, the mock returns 82; when it requests the height parameter, it returns 1.80. After that, the test follows the typical flow of unit tests: we call the method we want to test, doGet, and check if it returns the expected result.

This example also illustrates the disadvantages of using mocks. The primary drawback is that mocks increase the coupling between the test and the SUT. Typically, in unit tests, the test calls the tested method and checks its result. This way, the test doesn’t break when the internal code of the tested method is modified. However, when using mocks, this is no longer true, as the mock can depend on internal structures or events of the tested method, making the tests fragile. For instance, if the servlet’s output changes to Body Mass Index(BMI): [value], we must update the assertEquals in the unit test.

Finally, it’s important to note that not all objects and methods can be mocked. Generally, the following structures cannot be mocked: final classes and methods, static methods, and constructors.

8.7 Test-Driven Development

Test-Driven Development (TDD) is one of the programming practices proposed by Extreme Programming (XP). At first, the idea seems counterintuitive: given a unit test T for a class C, TDD argues that we should implement T before C. For this reason, this technique is also known as Test-First Development.

When we write the test first, it’s going to fail. Thus, in the workflow advocated by TDD, the next step is to write the code that makes this test pass, even if it’s initially just a trivial code. Then, this code should be completed and refined. Finally, if necessary, it should be refactored to improve its design, readability, maintainability, to follow design principles and patterns, etc.

TDD was proposed with three objectives in mind:

  • TDD prevents developers from forgetting to write tests. The reason is that TDD promotes testing as the first activity of any programming task, being it fixing a bug or implementing a new feature. Hence, it becomes more difficult to postpone the writing of tests to a later moment. Indeed, as we mentioned in Section 8.4, when using TDD, test coverage is usually greater than 90%.

  • TDD encourages writing code with high testability. This benefit is a natural consequence of the workflow inversion proposed by TDD: as developers have to write first the test T and then the class C, it is natural that they will design C to facilitate the writing of tests.

  • TDD is not only a testing but also a design practice. This happens because developers, by starting with the tests T, put themselves in the position of a user of the class C. In other words, with TDD, the first user of the class is its own developer—remember that T is a client of C since it calls methods from C. Therefore, it is expected that developers will define a simple interface for the class, using readable method names and avoiding many parameters, for example.

When working with TDD, developers should follow a cycle composed of three states, as shown in the next figure.

Cycles of TDD

According to this diagram, the first goal is to reach the red state when the test is not yet passing. It may seem strange, but the red state is already a small victory: by writing a test that fails, developers have a specification of the class that they need to implement next. As we have mentioned, in this state, it is also important that the developers think about the interface of the class under test, putting themselves in the position of a user of this class. Lastly, it is important that the class compiles. For this, developers must define at least the name of the class and the signature of its methods.

Next, the goal is to reach the green state. To do this, developers must implement the full functionality of the class under test and thus the tests will start to pass. However, this implementation can be performed in baby steps. Perhaps, in the initial steps, the code will be working partially, for example, returning only constants. This process will become clearer in the example that we will present soon.

Finally, developers should look for opportunities to refactor the class and the test. When using TDD, the goal is not just to reach the green state, when the program is working. In addition, developers should check the quality of this code. For example, they should check whether there is no duplicate code, whether there are large methods that can be broken into smaller ones, whether there are methods that can be moved to a different class, etc. After the refactoring step, we can finish or restart the cycle to implement another feature.

8.7.1 Example: Shopping Cart

To conclude, let’s simulate a programming session using TDD. For this, we will use a virtual bookstore system as an example. In this system, we have a Book class, with the attributes title, isbn, and price. And we also have a ShoppingCart class, which stores the books the customer decided to buy. This class must have methods to add a book to the cart, return the total price of the books in the cart, and remove a book from the cart. Next, we describe the implementation of these methods using TDD.

Red State: We start by defining that ShoppingCart has an add and a getTotal method. Besides defining the names of these methods, we define their parameters and write the first test:

@Test
void testAddGetTotal() {
  Book b1 = new Book("book1", 10, "1");
  Book b2 = new Book("book2", 20, "2");
  ShoppingCart cart = new ShoppingCart();
  cart.add(b1);
  cart.add(b2);
  assertEquals(30.0, cart.getTotal());
}

Despite simple and easy to understand, this test does not compile, as there is no implementation for the Book and ShoppingCart classes. Then, we have to provide that, as shown next:

public class Book {
  public String title;
  public double price;
  public String isbn;

  public Book(String title, double price, String isbn) {
    this.title = title;
    this.price = price;
    this.isbn = isbn;
  }

}

public class ShoppingCart {

  public ShoppingCart() {}

  public void add(Book b) {}

  double getTotal() {
    return 0.0;
  }
}

The implementation of both classes is very simple. We implemented just the minimum for the program to compile. Note, for example, that getTotal just returns 0.0. Despite this, we achieved our goal in the red state: we have now a test that compiles, runs, and fails!

Green State: The previous test can be seen as a specification for what we need to implement in ShoppingCart. So let’s do that:

public class ShoppingCart {
  public ShoppingCart() {}
  public void add(Book b) {}
  double getTotal() {
    return 30.0;
  }
}

However, the reader must be surprised again: this implementation is incorrect! The ShoppingCart constructor is empty, the class does not have attributes, getTotal always returns 30.0, etc. All of this is true, but we have achieved another small victory: the test changed from red to green. So, it is passing. With TDD, the improvements are always small. In XP’s vocabulary, they are called baby steps.

However, we should continue and provide a more realistic implementation for ShoppingCart. Here it is:

public class ShoppingCart {

  private ArrayList<Book> items;

  private double total;

  public ShoppingCart() {
    items = new ArrayList<Book>();
    total = 0.0;  
  }

  public void add(Book b) {
    items.add(b);
    total += b.price;
  }

  double getTotal() {
    return total;
  }

}

Now we have a list to store the cart items, an attribute to store the total value of the books, a constructor, an add method that adds the books to the list and increases the cart’s total, and so on. So, to the best of our understanding, this implementation meets the class specification and thus we have reached the green state.

Yellow State: Finally, we should look at the code that was implemented and put into practice the properties, principles, and design patterns we learned in the previous chapters. In other words: is there anything we can do to make this code more readable, easy to understand, and maintain? In this case, the idea that may arise is to encapsulate the Book fields. They are currently public, so we can implement getter methods to access them. As this implementation is simple, we won’t show the refactored code here.

At this point, we completed an iteration in the red-green-refactor TDD cycle. Now, we can stop, or think about implementing another requirement. For example, we can implement the method to remove books from the cart. For this, we should start another cycle.

8.8 Integration Tests

With integration tests—also referred to as service tests—we move to an intermediate level of the testing pyramid (see a figure of this pyramid in the first section of the chapter). Thus, the objective shifts from testing a small unit of code, like a single class, to exercise a complete service, that is, a complete feature of the system. Therefore, integration tests involve more classes, sometimes from distinct packages. They also test dependencies and real systems, such as databases and remote services. Moreover, when implementing integration tests, we don’t use mocks. As these are larger tests, they take more time to run and, consequently, are executed less frequently.

8.8.1 Example: Appointment App

Consider a simple app to add, remove, and edit appointments, as illustrated in the next figure. In this app, there is a class with methods to handle the appointments, as shown below:

Appointments app
public class AgendaFacade {
  public AgendaFacade(DB db);
  int addAppointment(Appointment p);
  void removeAppointment(int id);
  Appointment[] listAppointments();
}

Thus, we can write the following integration test for this class:

@Test
void AgendaFacadeTest() {
  DB db = DB.create();
  AgendaFacade agenda = new AgendaFacade(db);
  Appointment app1 = new Appointment(...);
  Appointment app2 = new Appointment(...);
  Appointment app3 = new Appointment(...);
  int id1 = agenda.addAppointment(app1);
  int id2 = agenda.addAppointment(app2);
  int id3 = agenda.addAppointment(app3);
  Appointment [] apps = agenda.listAppointments();
  assertEquals(3,apps.length);
}

It is worth mentioning two points about this test. First, it is implemented using JUnit, like the previous unit tests we studied in this chapter. That is, JUnit can be used for both unit and integration tests. Second, since it is an integration test, the class is tested with real dependencies, in this case, for a database. At the beginning of the test, this database is created with all the tables empty. Then, three appointments are saved and then retrieved from the database. Finally, an assert is called. Thus, this test exercises the main methods of our app, except those related to its graphical interface.

8.9 End-to-End Tests

End-to-end tests—also called system tests or interface tests—are positioned at the top of the testing pyramid. These are tests that simulate the use of a system by a real user. They are the most expensive tests, requiring more effort to implement and taking the longest to execute.

8.9.1 Example: Web System Test

Selenium is a framework for automating the tests of web systems. The framework allows the implementation of tests that act like robots opening web pages, filling out forms, clicking buttons, checking responses, etc. An example—extracted and adapted from the Selenium documentation (link)—is shown below. This code simulates a Firefox user making a Google search for the word software. The test prints out the title of the page with the results of this search.

public class SeleniumExample {

  public static void main(String[] args) {
    // creates a driver to access a web server
    WebDriver driver = new FirefoxDriver();

    // instructs the driver to "navigate" on Google
    driver.navigate().to("http://www.google.com");

    // gets a data input field, named "q"
    WebElement element = driver.findElement(By.name("q"));

    // fills this field with the word "software"
    element.sendKeys("software");

    // submits the data
    element.submit();

    // waits for the response page to load (8s timeout)
    (new WebDriverWait(driver,8)).
         until(new ExpectedCondition<Boolean>() {
    public Boolean apply(WebDriver d) {
      return d.getTitle().toLowerCase()
              .startsWith("software");
    }
    });

    // result should be: "software - Google Search"
    System.out.println("Page title is: "+driver.getTitle());

    // closes the browser
    driver.quit();
  }
}

Interface tests are harder to write, at least compared to unit tests and even integration tests. For example, the Selenium API is more complex than that of JUnit. Also, the test must handle interface events, like timeouts that occur when a page takes longer than usual to load. Interface tests are also more fragile, meaning they can break due to minor changes in the interface. For example, if the name of the search field on Google’s main page changes, the previous test has to be updated. However, when compared to the alternative—conducting the test manually—they are still competitive and have their benefits.

8.9.2 Example: Compiler Test

When implementing a compiler, we can use both unit and integration tests. But end-to-end tests, in this case, tend to be conceptually simpler. The reason is that a compiler interface doesn’t have windows and pages with graphical elements. Instead, a compiler receives an input file and produces an output file. Thus, to implement end-to-end tests for a compiler C for language X, we should create a set of programs in X, exercising various aspects of this language. For each program P, we should define a set of input and output data. Preferably, the output should be in a simple format, like a list of strings. In this context, the end-to-end tests are as follows: first, call C to compile each program P; then, run P with the defined input and verify if the result is as expected. This script is an end-to-end test, as we are exercising all modules of the compiler.

When compared to unit tests, it is harder to locate the code responsible for a failure in end-to-end tests. For example, in the case of the compiler tests, we will receive an indication that a program is not executing correctly. However, it might be challenging to map this failure to the compiler function responsible for the buggy code.

8.10 Other Types of Testing

8.10.1 Black-Box and White-Box Testing

Testing techniques can be classified as black-box or white-box. When using a black-box technique, tests are written considering only the interface of the code under test. For example, if the goal is to test a method as a black-box, the only available information is its name, parameters, return types, and exceptions. On the other hand, when using a white-box technique, the implementation of the tests considers information about the code and its structure. Black-box testing techniques are also referred to as functional tests, and white-box techniques are called structural tests.

However, it is not straightforward to classify unit tests into either of these categories. Indeed, the classification depends on how the tests are written. If the unit tests are written using information only about the interface of the methods under test, they are considered black-box. However, if their implementation considers information about test coverage, such as branches that are covered or not, then they are white-box tests. In summary, unit tests test a small and isolated unit of code. This unit can be tested in the form of a black-box (considering only its interface and specification) or in the form of a white-box (considering and taking advantage of its internal structure for implementing more effective tests).

A similar observation can be made about the relationship between TDD and black-box/white-box testing. To clarify this relationship, let’s reproduce the following comment from Kent Beck (source: Test-Driven Development Violates the Dichotomies of Testing, Three Rivers Institute, 2007):

Another misleading dichotomy is between black-box and white-box tests. Since TDD tests are written before the code they are to test, they are black-box tests. However, I commonly get the inspiration for the next test from looking at the code, a hallmark of white-box testing.

8.10.2 Test Data Selection

When adopting black-box testing, there are techniques to assist in the selection of the inputs that will be tested. For example, Equivalence Classes is a technique that recommends dividing the inputs of a program into sets of values that have the same chance of presenting a bug. These sets are called equivalence classes. For each equivalence class, we should test only one of its values, which can be selected randomly. Suppose a function to calculate the income tax amount to pay, for each salary range, as illustrated in the next table. Partitioning based on equivalence classes recommends testing this function with four salaries, one from each salary range.

Salary Tax Rate
From 1,903.99 to 2,826.65 7.5%
From 2,826.66 to 3,751.05 15%
From 3,751.06 to 4,664.68 22.5%
Above 4,664.68 27.5%

Boundary Value Analysis is a complementary technique that recommends testing also with the boundary values of each equivalence class and with the values that precede or succeed such boundaries. The reason is that bugs are often caused by inappropriately handling boundary conditions. Thus, in our example, for the first salary range, we should also test with these values:

  • 1,903.98: a value that precedes the lower limit
  • 1,903.99: lower limit
  • 2,826.65: upper limit
  • 2,826.66: a value that succeeds the upper limit

However, as the reader might be wondering, it is not always straightforward to define the equivalence classes for the input domain of a function. That is, not all functions are organized into well-defined input ranges like those in our example.

To conclude, we would like to comment that exhaustive testing, essentially, testing a program with all possible inputs, in practice, is impossible, even for small programs. For example, even a function with only two integer parameters can take years to be tested with all possible pairs of integers. Random tests, where the test data is chosen randomly, are also not sufficient, in most cases. The reason is that we may select values from the same equivalence class, which is redundant. Meanwhile, other equivalence classes might be left untested.

8.10.3 Acceptance Tests

These are tests carried out by the customers, using their data. The results will determine whether or not the customers agree with the implemented software. If they agree, the system can be put into production. If they do not, the necessary adjustments need to be made. For example, when using agile methods, a user story is only considered finished after it passes the acceptance tests defined and conducted by the Product Owner.

Acceptance tests have two characteristics that distinguish them from the tests we’ve studied earlier in this chapter. First, they are usually manual tests, carried out by the customers or their representatives. Second, they are not exclusively a verification activity (as with the previous tests), but also a software validation activity. As we studied in Chapter 1, verification tests if we’ve implemented the software correctly, that is, in line with its specification. Meanwhile, validation tests if we’ve implemented the correct software, that is, the one requested and required by the customers.

Acceptance tests are commonly divided into two main types. Alpha tests are conducted with customers, but in a controlled environment, such as the developer’s machine. If the system passes such tests, a test with a larger customer group can be undertaken, this time no longer in a controlled environment. These tests are referred to as beta tests.

8.10.4 Non-Functional Requirements Testing

Previous testing strategies checked only functional requirements; therefore, their goal is to find bugs. However, it is also possible to perform tests to verify non-functional requirements. For example, there are tools that support the execution of performance tests, to check the system’s behavior under some load. An e-commerce company can use these tools to simulate the performance of their website during major events, like Black Friday, for instance. Usability tests are used to evaluate the system’s user interface and frequently involve the observation of real users using the system. Failure tests simulate abnormal events in a system, for example, the failure of some services or even an entire data center.

Bibliography

Gerard Meszaros. xUnit Test Patterns: Refactoring Test Code. Addison-Wesley, 2007.

Kent Beck, Erich Gamma. Test-infected: programmers love writing tests. Java Report, 3(7):37-50, 1998.

Kent Beck. Test-Driven Development: by Example, Addison-Wesley, 2002.

Dave Thomas and Andy Hunt. Mock Objects. IEEE Software, 2002

Maurício Aniche. Testes automatizados de software: um guia prático. Casa do Código, 2015.

Jeff Langr, Andy Hunt, Dave Thomas. Pragmatic Unit Testing in Java 8 with Junit. O’Reilly, 2015.

Exercises

1. Describe three benefits associated with unit testing.

2. Suppose a function fib(n), which returns the n-th term of the Fibonacci sequence, i.e., fib(0) = 0, fib(1) = 1, fib(2) = 1, fib(3) = 2, fib(4) = 3, etc. Write a unit test for this function.

3. Rewrite the following test, which checks the occurrence of an EmptyStackException, to make it simpler and easier to understand.

@Test
public void testEmptyStackException() {
  boolean success = false;
  try {
    Stack s<Integer> = new Stack<Integer>();
    s.push(10);
    int r = stack.pop();
    r = stack.pop();
  } catch (EmptyStackException e) {
    success = true;
  }
  assertTrue(success);
}

4. Suppose a developer wrote the following test for the Java ArrayList class. As you’ll notice several System.out.println are used in this code. Thus, in essence, it’s a manual test, as the developer has to manually check the results. Rewrite each test (from 1 to 6) as a unit test.

import java.util.List;
import java.util.ArrayList;

public class Main {

  public static void main(String[] args) {

    // test 1  
    List<Integer> s = new ArrayList<Integer>();
    System.out.println(s.isEmpty());

    // test 2
    s = new ArrayList<Integer>();
    s.add(1);
    System.out.println(s.isEmpty());

    // test 3
    s = new ArrayList<Integer>();
    s.add(1);
    s.add(2);
    s.add(3);
    System.out.println(s.size());
    System.out.println(s.get(0));
    System.out.println(s.get(1));
    System.out.println(s.get(2));

    // test 4
    s = new ArrayList<Integer>();
    s.add(1);
    s.add(2);
    s.add(3);
    int elem = s.remove(2);
    System.out.println(elem);
    System.out.println(s.get(0));
    System.out.println(s.get(1));

    // test 5
    s = new ArrayList<Integer>();
    s.add(1);
    s.remove(0);
    System.out.println(s.size());
    System.out.println(s.isEmpty());

    // test 6
    try {
      s = new ArrayList<Integer>();
      s.add(1);
      s.add(2);
      s.remove(2);        
    }

    catch (IndexOutOfBoundsException e) {
      System.out.println("IndexOutOfBound");
    }

  }

}

5. The following function has four statements, including two if, which, thus, generate four branches:

void f(int x, int y) {
  if (x > 0) {
     x = 2 * x;
     if (y > 0) {
        y = 2 * y;
     }
  }
}

With the previous observation in mind, fill the following table with the statement and branch coverage obtained from the tests specified in the first column. In other words, the first column defines the calls that are tested.

Test Call Statements Coverage Branch Coverage
f(0,0)
f(1,1)
f(0,0) and f(1,1)

6. Students get an A in a course if they score 90 or more. Thus, consider the following function that checks this requirement:

boolean isScoreA(int grade) {
  if (grade > 90)
    return true;
  else return false;
}

The implementation of this function has three statements, including one if, resulting in two branches. Now, answer the following questions:

  1. Does this function have a bug? If yes, when does it result in a failure?

  2. Suppose the function is tested with two grades: 85 and 95. What is the statement coverage in this case? And the branch coverage?

  3. Consider the following sentence: if a program has 100% coverage both at the statement and branch level, it is bug-free. Is it true or false? Justify your answer.

7. Complete the assert commands in the indicated sections.

public void test1() {
   LinkedList list = mock(LinkedList.class);
   when(list.size()).thenReturn(10);
   assertEquals(___________, ___________);
}

public void test2() {
   LinkedList list = mock(LinkedList.class);
   when(list.get(0)).thenReturn("Software");
   when(list.get(1)).thenReturn("Engineering");
   String result = list.get(0) + " " + list.get(1);
   assertEquals(___________, ___________);
}

8. Suppose two classes A and B, with A using B. To enable unit testing of A, a mock for B, called BMock, was created. The unit test of A is passing. However, the integration test of A and B is failing. Thus, describe a more realistic scenario, in which A, B, and BMock are classes with methods performing real functions. The proposed scenario should include a bug hidden by BMock. In other words, B has a bug that only appears in the integration test.