Book cover
All rights reserved. Version for personal use only.
This web version is subjected to minor edits. To report errors or typos, use this form.

Home | Dark Mode | Cite

Software Engineering: A Modern Approach

Marco Tulio Valente

1 Mutation Testing: A Practical Overview

1.1 Introduction

The concept of mutation testing was first proposed in 1978 in the following paper. To better understand the concept, the key point to remember is that a mutation test does not intend to detect bugs in the production code, as is the case, for example, with unit tests, integration, end-to-end, snapshot tests, etc.

Instead, mutation tests are used to evaluate the effectiveness of the automated tests that already exist in the system. In other words, the assumption is that you already have several tests and you want to know if they are effective, that is, truly capable of detecting bugs and regressions.

To do this, a mutation testing tool makes small modifications to the production code, generating a version of the code referred to as a mutant. For example, mutants can be created through the following operations:

  • Removal or duplication of a command or expression.
  • Changing operators, for example, replacing an addition (+) with a subtraction (-).
  • Inserting an operator, for example, cond becomes !cond.
  • Changing a constant, for example, True becomes False.

As the mutations are made arbitrarily, they naturally should introduce bugs. And, then, the existing tests should fail when run on them. If this does not happen, we can conclude that the tests are not good enough.

Mutation testing is a type of white-box testing, as its implementation requires knowledge of the internal code of the system’s functions. As stated, this knowledge is necessary to generate the mutations.

The next figure—taken from a blog post by Google—illustrates a real use of mutation testing. The gray background message informs that a mutation was made in the expression of an if command, changing a == b for a != b. However, as also reported in the message, this mutation did not cause the failure of any existing tests.

Example of a mutation that survived the execution of the tests. Source: Google Testing Blog

When encountering the above message, the developer should analyze the performed mutation and identify the system behavior that was compromised by it. Subsequently, they should write a unit test that exercises that behavior and, therefore, fails when executed on the mutant.

1.2 Mutation Score

The mutation score is a widely used metric with this type of testing. It is defined as follows:

mutation score = number of killed mutants / total number of generated mutants

A mutant is said to have been killed when it is detected by some existing test. Therefore, ideally, we would expect the mutation score to be 100%.

1.3 Mutation Testing Tools

There are several tools available for mutation testing. In the case of Java, one of the most popular tools is Pitest, which implements several strategies to reduce the execution time of mutation tests. Among them, we can mention the following:

  • Pitest performs mutations directly on the compiled code. That is, there is no need to compile a mutant to know whether it survives the existing tests or not.

  • Pitest doesn’t run all the program’s tests to determine whether a mutant M is killed by the tests, but only those tests that execute M’s code.

1.4 Example: JFreeChart

JFreeChart is a Java library for constructing charts. The version 1.0.19 of the system has 47 KLOC and 1,320 tests.

The following paper analyzes the use of Pitest in JFreeChart. As described in the paper, when executed on the mentioned version of JFreeChart, Pitest generates 256K mutants in 109 minutes. The resulting mutation score is 19%.

Thus, this example illustrates one of the main problems of mutation testing, that is, the high computational cost, even after all the optimizations implemented by Pitest. In a relatively small system (47 KLOC), nearly two hours were required to test all mutants.

Moreover, the achieved mutation score (19%) was low, which suggests that there’s room to write new tests for this system.

1.5 Equivalent Mutants

In specific situations, mutation operators can generate mutants that represent valid behavior, i.e., that do not introduce any bug. These mutants are called equivalent mutants.

A simple example is a mutation in dead code, that is, code that is no longer called by any part of the system.

The issue is that, by definition, it’s not possible to kill equivalent mutants. That is, as they don’t change the behavior of the program, it’s not possible to write a test that fails when executed on the mutant. In these cases, the best solution is to refactor the code to remove the situation that caused the generation of the equivalent mutant. In our example, we could simply delete the dead code, since it is no longer used in the system.

1.6 Final Remarks

Mutation testing can be viewed as being the test of the tests. That is, they are primarily useful when it’s important to have the highest quality and reliability in tests.

On the other hand, in many systems, developers are aware that the tests are not that good. Often, they know very well the parts and features of a system that need better test coverage. Therefore, in these cases, investment in mutation testing should not be a priority.


1. Consider the following function:

def isGradeA(score):
    if (score >= 90):
       return True
    return False

Also consider the following test:

def test():
  1. What is the statement coverage of this test?

  2. Generate a mutant for the function that’s not killed by the test.

  3. Modify the test so it fails with the mutant you generated.

2. Consider the following function that verifies if a bank client is VIP, depending on their balance and relationship time with the bank:

def isVIPClient(balance, time):
    if (balance > 10000) or (time > 10):
       return True
    return False

See also the existing test of this function (which has a statement coverage of 100%):

def test():
    assertTrue(isVIPClient(11000, 11))
    assertTrue(isVIPClient(10000, 11))
    assertFalse(isVIPClient(9000, 9))

So, when using a mutation testing tool, the following mutant was generated:

# mutant: first condition of the "if" was removed
def isVIPClient(balance, time):
    if (time > 10): 
       return True
    return False

Notice, however, that the presented test does not kill this mutant, i.e., it doesn’t fail when executed with the mutant.

Thus, add one more assert to the test so that it now kills the mutant.

  1. Consider the following class, now in Java:
public class Client {
  public boolean isVIP (double balance) {
    if (balance > 10000) {
       return true;
    return false;

And the following unit test:

public class Test {
  public void test1() {
    Client client = new Client();

Finally, consider the following output generated by the Pitest tool when run on the program with the previously presented class and test:

Negated conditional -> KILLED

Changed conditional boundary -> SURVIVED

Replaced boolean return with false for Client::isVIP -> KILLED

Replaced boolean return with true for Client::isVIP -> NO_COVERAGE

As we can see, four mutants were generated, two of which were killed, one survived and the last mutant was not covered by the test.

Thus, modify the unit test, adding one more assert, so that it kills all four mutants. Consequently, the new report generated by the tool should look like this:

Negated conditional -> KILLED

Changed conditional boundary -> KILLED

Replaced boolean return with false for Client::isVIP -> KILLED

Replaced boolean return with true for Client::isVIP -> KILLED

Follow us on LinkedIn or Twitter.