Book cover

Buy e-book on Leanpub

To report errors or typos, use this form.

Home | Dark Mode | Cite

Software Engineering: A Modern Approach

Marco Tulio Valente

10 DevOps

Imagine a world where product owners, development, QA, IT Operations, and Infosec work together, not just to aid each other, but to guarantee the overall success of the organization. – Gene Kim, Jez Humble, Patrick Debois, John Willis

This chapter starts by discussing the concept of DevOps and its benefits (Section 10.1). Essentially, DevOps is a movement—or more specifically, a set of concepts and practices—aiming to introduce agile principles in the last mile of a software project, i.e., when the system is entering production. Besides discussing the concept, we address three important practices when adopting DevOps. They are: Version Control (Section 10.2), Continuous Integration (Section 10.3), and Continuous Deployment (Section 10.4).

10.1 Introduction

So far, in this book, we studied a set of practices for high-quality and agile software development. From agile methods, like Scrum, XP, or Kanban, we learned that clients should be involved from day one in the software construction process. We also discussed significant practices for producing high-quality software, like unit tests and refactoring. Additionally, we examined several design principles and patterns.

Thus, after applying what we learned, the software product—or an increment of it, resulting from a sprint—is ready to go into production. This task is known as deployment, release, or delivery. Regardless of the name, it is not as simple and quick as it may seem.

Traditionally, in conventional organizations, the Information Technology area was divided into two main departments:

Nowadays, it’s easy to imagine the problems caused by this division. Most of the time, the support team would become aware of a system on the eve of its deployment. Consequently, the deployment could be postponed for months, due to a variety of unaddressed issues, such as lacking hardware to run the new system, performance problems, incompatibility with the production database, security vulnerabilities, etc. Ultimately, these problems could result in the deployment’s cancellation and the project’s abandonment.

In summary, in this traditional model, a significant stakeholder—the system administrators or sysadmins—would only become aware of the characteristics and non-functional requirements of the new software just before the deployment. This issue was exacerbated by systems following a monolithic architecture, whose deployment can create all sorts of concerns, including bugs and regressions in working and running modules.

Therefore, to facilitate the deployment and delivery of software, the DevOps concept was proposed. Since it is a more recent term, it still lacks a consolidated definition. However, DevOps is commonly described as a movement that aims to unify the development (Dev) and operations (Ops) cultures, attempting to enable faster and more agile software deployments. This objective is reflected in the quote that opens this chapter, from Gene Kim, Jez Humble, Patrick Debois, and John Willis, who were part of a group of developers that helped propagate the DevOps principles. According to them, DevOps represents a disruption in traditional software deployment culture (link):

Instead of starting deployments at midnight on Friday and spending the weekend working to complete them, deployments occur on any business day when everyone is in the company and without customers noticing—except when they encounter new features and bug fixes.

However, DevOps does not advocate for creating a new professional role responsible for both development and deployment. Instead, the idea is to build a closer relationship between the development and operations teams, aiming to make software deployment more agile and less traumatic. To express this in other words, the aim is to avoid two independent silos: developers and operators, with little to no interaction between them, as illustrated in the following figure.

Organization that is not based on DevOps because there is little communication between Dev and Ops.

Instead, DevOps supporters argue that these professionals should work together from the early sprints of a project, as illustrated in the next figure. For the customers, the benefit should be the earlier delivery of the contracted software project.

Organization based on DevOps. Devs and Ops often sit together to discuss issues regarding software delivery.

When transitioning to a DevOps culture, agile teams can incorporate an operations professional, engaging in the work either part-time or full-time. Depending on the demand, this professional may also contribute to more than one team. As part of their work, they will proactively address performance problems, security issues, incompatibilities with other systems, etc. In parallel with the codification tasks, they can also work on the installation, administration, and monitoring scripts for the production software.

Importantly, DevOps advocates automating all necessary steps to put a system into production and to monitor its correct operation. This requires the adoption of practices we have already studied in this book, notably automated tests. But it also requires the use of new practices and tools, such as Continuous Integration and Continuous Deployment, which we will examine in this chapter.

Real World: The term DevOps began to be used in the late 2000s by professionals frustrated with the constant friction between development and operations teams. They became convinced that the solution would be adopting agile principles not only in the development but also in the deployment phase. To point out a specific date, the first industry conference on the topic, called DevOpsDay, took place in Belgium in November 2009. It is generally considered that the term DevOps was coined at this conference, which was organized by Patrick Dubois (link).

Finally, we’ll comment on a set of principles for software delivery, proposed by Jez Humble and David Farley (link). Although proposed before DevOps gained traction, they are perfectly aligned with this movement. Some of these principles are as follows:

10.2 Version Control

As we mentioned numerous times in this book, software is developed in teams. Therefore, we need a repository, i.e., a server, to store the source code of the system that is being implemented by such teams. The existence of this server is very important for these developers to collaborate and for the operators to know precisely which version of the system should be deployed to production. Moreover, it keeps a history of the most important versions of each file. This allows, if needed, to perform an undo in order to recover the code of a file as it was years ago, for example.

A Version Control System (VCS) offers the two services mentioned in the previous paragraph. First, it provides a repository to store the most recent version of a system’s source code, as well as related files, such as documentation files, configuration files, web pages, wikis, etc. Second, it allows retrieval of older versions of any file, if necessary. As we said in the previous section, it is, nowadays, inconceivable to develop any system, no matter how simple, without a VCS.

The first version control systems emerged in the early 1970s, such as the SCCS system, developed for the Unix operating system. Subsequently, other systems appeared, such as CVS in the mid-1980s, and then the Subversion system, also known by its acronym svn, in the early 2000s. All are centralized systems based on a client/server architecture (see the next figure). In this architecture, a single server stores the repository and the version control system. Clients access this server to obtain the most recent version of a file. After that, they can modify the file, for example, to fix a bug or implement a new feature. Finally, they update the file on the server, performing an operation called commit, which makes the file visible to other developers.

Centralized VCS. There is a single server.

In the early 2000s, Distributed Version Control Systems (DVCS) began to appear. Among them, we can mention the BitKeeper system, whose first release was in 2000, and the Mercurial and Git systems, both launched in 2005. Instead of a client/server architecture, a DVCS follows a peer-to-peer architecture. In practice, this means that each developer has a full version control system on their machine, which can communicate with the servers on other machines, as illustrated in the next figure.

Distributed VCS (DVCS). Each client has a server. Thus, the architecture is peer-to-peer.

In theory, when using a DVCS, the clients (or peers) are functionally equivalent. However, in practice, there is usually a primary machine that holds the reference version of the source code. In our figure, we call this repository the central repository. Each developer can work independently and even offline on their client machine, making commits to their repository. From time to time, they should synchronize this repository with the central one, through two operations: pull and push. A pull updates the local repository with new commits available in the central repository. Conversely, a push performs the opposite operation; that is, it sends the latest commits made by the developer in their local repository to the central one.

Compared to centralized VCS, a DVCS has the following advantages:

Git is a distributed version control system whose development was led by Linus Torvalds, also responsible for creating the Linux operating system. In the early years, the development of the Linux kernel used a commercial version control system called BitKeeper, which also followed a distributed architecture. However, in 2005, the company that owned BitKeeper decided to revoke the free licenses used in the development of Linux. The Linux developers, led by Torvalds, then decided to implement their own DVCS, which they named Git. Like Linux, Git is an open-source system that can be freely installed on any machine. Git is also a command-line system. However, there are graphical interface clients—developed by third parties—that allow using Git without typing commands.

GitHub is a code hosting service that uses the Git system to provide version control. GitHub offers free public repositories for open-source projects and paid private repositories for corporate use. Thus, instead of maintaining a DVCS internally, a software company can subscribe to this service from GitHub. A comparison can be made with email services. Instead of installing an email server locally, a company usually access this service from third parties, like Google, through Gmail. Although GitHub is the most popular, similar services are provided by other companies, like GitLab and Bitbucket.

In Appendix A, we present and illustrate the main commands of the Git system. Concepts such as forks and pull requests, which are specific to GitHub, are also explained.

10.2.1 Multirepos vs Monorepos

As we mentioned before, a VCS system manages repositories. Thus, an organization needs to decide the repositories it will create in its VCS. An usual decision is to create one repository for each project or system in the organization. However, solutions based on a single repository are also possible and usually adopted by large companies, such as Google, Facebook, and Microsoft. These two alternatives—referred to as multirepos and monorepos, respectively—are illustrated in the next figures.

Multirepos: the VCS manages several repositories. Normally, one repository per project.
Monorepos: the VCS manages a single repository. Projects are directories of this repository.

If we think in terms of GitHub accounts and repositories, we can give the following examples:

Among the advantages of monorepos, we can mention:

On the other hand, monorepos require specific tools to navigate large codebases. For example, those in charge of Google’s monorepo have commented that they were forced to implement a plug-in for the Eclipse IDE to facilitate working with a very large codebase, like the one they have internally in the company (link).

10.3 Continuous Integration

We start with a motivational example and then we introduce the concept of Continuous Integration (CI). After that, we discuss complementary practices that an organization should adopt along with CI. We end with a brief discussion about scenarios that may discourage the use of CI in an organization.

10.3.1 Motivation

Before defining Continuous Integration, let’s describe the problem that led to the proposal of this integration practice. Traditionally, it was common to use branches during the implementation of new features. Branches can be understood as internal and virtual sub-directories, managed by the version control system. In these systems, there is a principal branch, known as main (when using Git) or trunk (when using other systems, like SVN). In addition to the main branch, users can create their own branches.

For example, before implementing a new feature, it may be common to create a branch to hold its code. These branches are called feature branches, and depending on the complexity of the feature, they may take months to be merged back into the main development line. In fact, in larger and complex projects, there can be dozens of active feature branches.

When the implementation of the new feature is finished, the code from the branch must be copied back to the main branch through a command provided by the version control system called merge. At this point, a variety of conflicts can occur, which are called integration or merge conflicts.

To illustrate, suppose Alice created a branch to implement a new feature X in her system. Since this feature is complex, Alice worked in isolation on her branch for 40 days, as shown in the next figure (each node of this graph is a commit). Note that while Alice was working—and committing on her branch—commits were also being made on the main branch.

Development using feature branches

After 40 days, when Alice merged her code into the main, numerous conflicts arose, such as:

In large systems, with thousands of files, dozens of developers, and several feature branches, the problems caused by conflicts can assume considerable proportions and delay the deployment of new features. Note that conflict resolution is a manual task, requiring analysis and consensus among the involved developers. That explains why the terms integration hell or merge hell are commonly used to describe the problems related to the integration of feature branches.

Additionally, feature branches, especially the long-lasting ones, help to create knowledge silos. That is, each new feature ends up having an owner, as a developer was dedicated to it for weeks. Therefore, this developer may feel comfortable adopting different patterns from the rest of the team, including architectural and design patterns, code layout patterns, user interface patterns, etc.

10.3.2 What is Continuous Integration?

Continuous Integration (CI) is a programming practice proposed by Extreme Programming (XP). The motivation behind the practice was already discussed in the first section of this chapter: if a task causes pain, we should not let it accumulate. Instead, we should break it into subtasks that can be performed frequently. As these subtasks are small and simple, they will cause less pain.

In our context, large integrations are a major source of pain for developers, as they have to manually resolve multiple conflicts. Therefore, CI recommends integrating the code frequently, that is, continuously. Thus, the integrations will be small and will produce fewer conflicts.

In his XP book, Kent Beck defends the use of CI as follows (link, page 49):

Integrate and test changes after no more than a couple of hours. Team programming isn’t a divide-and-conquer problem. It’s a divide, conquer, and integrate problem. The integration step can easily take more time than the original programming. The longer you wait to integrate, the more it costs and the more unpredictable the cost becomes.

In this quote, Beck advocates several integrations over a developer’s workday. However, this recommendation is not consensual. Other authors, like Martin Fowler, mention at least one integration per day per developer (link), which seems to be a minimum limit for a team to argue that it is using CI.

10.3.3 Best Practices When Using CI

When using CI, the main branch is constantly updated with new code. To ensure that it is not broken—that is, that the code compiles and runs with success—some practices should be used along with CI, as discussed next.

Automated Build

Build designates the process of compiling and producing an executable version of a system. When using CI, this process must be automated, that is, it should not include manual steps. Furthermore, it should be as quick as possible, since with CI, the build is called continuously. Some authors, for instance, recommend a limit of 10 minutes for performing a build (link).

Automated Tests

Besides ensuring that the system compiles without errors after a new integration, it is also important to ensure that it continues to run and to produce the expected results. Therefore, when using CI, we should have good test coverage, especially by unit tests, as studied in Chapter 8.

Continuous Integration Servers

Lastly, automated builds and tests should be executed frequently, preferably before any code is integrated in the main branch. For this, we can use CI Servers, which work as follows (also see the next figure):

Continuous Integration Server

The main goal of a CI server is to prevent the integration of code with errors, whether compilation or logic errors. For example, the build on the developer’s machine may have been successfully completed. But when executed on the CI server, it might fail. This happens, for instance, when the developer forgets to commit a file. Incorrect dependencies are another reason for build breaks. For example, the code might be compiled and tested on the developer’s machine using version 2.0 of a certain library, but the CI server performs the build using version 1.0.

There are several Continuous Integration servers in the market. Some of them are offered as an independent service, usually free for public repositories, but paid for private ones.

Another common question is whether CI is compatible with the use of feature branches. In order to keep coherence with the definition of CI, the best answer is the following: yes, as long as the branches are frequently integrated into the main branch, for instance, every day. In other words, CI is incompatible only with long-lived feature branches.

Trunk-Based Development

As we’ve seen, when adopting CI, branches must last for a maximum of one working day. Therefore, the cost/benefit of creating them may not be worth it. For this reason, when shifting to CI, it’s common to also use Trunk-Based Development (TBD). With TBD, there are no more branches for new feature implementation or bug fixes (or they exist only in the developer’s local repository and thus have a short duration). Thus, all development takes place on the main branch, also known as the trunk.

Real World: TBD is used by major software companies. For example, at Google, almost all development occurs at the HEAD of the repository, not on branches. This helps identify integration problems early and minimizes the amount of merging work needed. It also makes it much easier and faster to push out security fixes (link). At Facebook, all front-end engineers work on a single stable branch of the code, which also promotes rapid development, since no effort is spent on merging long-lived branches into the trunk (link).

Pair Programming

Pair Programming can be viewed as a continuous form of code review. When adopting this practice, any new piece of code is reviewed by another developer, who sits next to the lead developer of the programming session. Thus, like continuous builds and tests, it is usually recommended to use Pair Programming with CI. However, this use is not mandatory. For example, the code can be reviewed after the commit reaches the mainline. In this case, given that the code is visible to other developers and can be moved into production at any time, the costs of applying a revision are higher.

10.3.4 When not to use CI?

CI proponents set a firm limit for integrations on the mainline: at least one integration per day per developer. However, depending on the organization, on the domain of the system (which can be a critical application, for instance), and on the profile of the developers (who might be beginners), it might be challenging to follow this limit.

Moreover, this limit is not a law of physics. For instance, it might be worth to perform an integration every two or three days. In fact, any software engineering practice—including Continuous Integration—should not be taken literally, that is, exactly as it is described in the manual or in a textbook. Context-justified adaptations are possible and should be carefully considered. Thus, experimenting with different integration intervals can help to define the best setup for your organization.

CI is also not compatible with open-source projects. Most often, the developers of these projects are volunteers and do not work daily on the code. In these cases, a model based on Pull Requests and Forks, as proposed by GitHub, tends to be the best decision. In Appendix A, we will give more details about these concepts.

10.4 Continuous Deployment

With Continuous Integration, new code is often integrated into the main branch. However, this code doesn’t have to be ready for production. That is, it can be a preliminary version, integrated so that other developers become aware of its existence and, consequently, avoid future integration conflicts. For example, you can integrate a preliminary version of a web page, with a poor interface, or a feature with known performance issues.

However, there is another step in the automation chain proposed by DevOps, called Continuous Deployment (CD). The difference between CI and CD is simple, but the impacts are profound: when using CD, every new commit that reaches the main branch is deployed into production, in a matter of hours, for instance. The workflow when using CD is as follows:

Among the advantages of CD, we can mention:

Real World: Various companies that develop web apps use CD. For instance, in an article published in 2016, Savor and colleagues reported that at Facebook, each developer, on average, used to put 3.5 updates into production a week (link). These updates added or modified an average of 92 lines of code. These numbers reveal that, to work well, CD requires small updates. Therefore, developers have to develop the skill to break a programming task (e.g., a new feature, even if complex) into small parts that can be quickly implemented, tested, and deployed.

10.4.1 Continuous Delivery

Continuous Deployment (CD) is not recommended for certain types of systems, including desktop apps (like an IDE or a web browser) and embedded software (like a printer driver). You probably wouldn’t want to be notified daily that there is a new version of your browser or that a new driver is available for your printer. These systems require an installation process that is not transparent to users, as is the case with a web system update.

However, in such cases, a variant known as Continuous Delivery can be used. The idea is simple: when using Continuous Delivery, every commit can be pushed into production immediately. That is, developers should program as if this was going to happen. However, there is an external authority—a project or release manager, for instance—who decides when the commits will actually be released to customers. Marketing and corporate strategies are examples of forces that can influence this decision.

Another way to explain these practices is through the following difference:

When we adopt Continuous Deployment, both processes are automatic and continuous. But with Continuous Delivery, delivery is carried out frequently, while deployment depends on manual authorization.

Despite adopting Continuous Deployment or Delivery, software companies are nowadays reducing their release cycles to keep users engaged, receive feedback, keep developers motivated, and remain competitive in the market. This is happening even with desktop apps. For example, in 2024, Google is shipping a major version of the Chrome browser every four weeks. Moreover, weekly updates are used to deploy security fixes and keep Chrome’s patch gap short.

10.4.2 Feature Flags

It is not reasonable to assume that every commit will be ready to be immediately deployed into production. For example, a developer may be working on a new feature X but still needs to implement part of its logic. Therefore, this developer may ask:

If new releases happen almost every day, how can I prevent my unfinished implementation, which has not been properly tested and have performance issues, from reaching the company’s customers?

One solution is to refrain from integrating the code into the main development branch. However, we no longer want to use this practice, as it leads to what we called integration or merge hell. In other words, we don’t want to give up Continuous Integration and Trunk-Based Development.

A solution to this problem is as follows: continuously integrate the partial code of feature X, but with its code disabled, meaning, any code related to X is guarded by a boolean variable (or flag) that, while the implementation is not finished, evaluates as false. A hypothetical example is shown next:

featureX = false;
if (featureX) 
   "here is my incomplete code for X"
if (featureX)
   "more incomplete code for X"

In Continuous Deployment, variables used to prevent the deployment of partial implementations are called Feature Flags or Feature Toggles.

To illustrate with a second example, suppose you’re working on a new page of a certain web application. Then you can use a feature flag to enable/disable this page, as shown next:

new_page = false;
if (new_page) 
   "show new page"
   "show old page"

This code will go into production while the new page is not ready. However, during the implementation, locally, on your machine, you can enable the new page by setting the new_page flag to true.

As you can notice, there is code duplication between both pages for a certain time. However, after the new page is approved, goes into production, and receives positive feedback from customers, the old page’s code and the feature flag (new_page) can be removed. Thus, the duplication was temporary.

Real World: Researchers from two Canadian universities, led by Professors Peter Rigby and Bram Adams, conducted a study on the use of feature flags over 39 releases of the Chrome browser, for five years of development, from 2010 to 2015 (link). During this period, they found more than 2,400 distinct feature flags in the browser’s code. In the first version analyzed, they cataloged 263 flags; in the last version, the number increased to 2,409. On average, a new release added 73 new flags and removed 43 flags. Hence, the growth seen in the study.

However, feature flags can be kept in the code during the deployment phase. This can occur for two reasons, as described below.

First, feature flags help implement what is called a canary release. In this type of release, a new feature—guarded by a feature flag—is initially made available to a small group of users. For example, only 5% of users. This way, the problems caused by potential bugs in this new feature will be minimized. Then, after a successful initial deployment, the user base that accesses the new feature is gradually increased until reaching all users. The term canary release is in reference to a common practice in the exploration of new coal mines. Miners usually enter these mines with a canary in a cage. If the mine had any toxic gas, it would kill the canary, and the miners could withdraw to prevent intoxication.

In addition, feature flags help enable A/B Tests, as we studied in Chapter 3. To recall, in these tests, two versions of a feature (old vs. new, for instance) are simultaneously released to distinct user groups, aiming to verify if the new feature indeed adds value to the current implementation.

To facilitate the execution of canary releases and A/B tests, we can use a data structure to store the flags and their state (on or off). An example is shown next:

FeatureFlagsTable fft = new FeatureFlagsTable();
fft.addFeature("new-shopping-cart", false);
if (fft.IsEnabled("new-shopping-cart"))
   // process purchase using new cart
   // process purchase using current cart

There are also libraries dedicated to managing feature flags, which provide classes similar to FeatureFlagsTable in the previous code. The advantage here is that the flags can be set externally to the program, for example, in a configuration file. On the other hand, when the flag is an internal boolean variable, to change its value, the code needs to be edited and recompiled.

In-Depth: In this section, our focus was on the use of feature flags to prevent a piece of code to reach customers when an organization is using Continuous Deployment. Feature flags with this purpose are also called release flags. However, feature flags can be used for other purposes. An example is creating different versions of the same software. For instance, suppose a system has a free and a paid version. Customers of the paid version have access to more features, whose code is delimited by feature flags. In this specific case, the flags are called business flags.


Gene Kim, Jez Humble, John Willis, Patrick Debois. The DevOps Handbook: How to Create World-Class Agility, Reliability, & Security in Technology Organizations. IT Revolution Press, 2016.

Jez Humble, David Farley. Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. Addison-Wesley, 2010.

Steve Matyas, Andrew Glover, Paul Duvall. Continuous Integration: Improving Software Quality and Reducing Risk. Addison-Wesley, 2007.


1. Define and describe the objectives of DevOps.

2. Job offers in the IT sector often mention vacancies for a DevOps Engineer, requiring skills such as:

Considering the definition of DevOps you used in your answer to the previous exercise, do you think it’s appropriate for an employee’s role to be a DevOps Engineer? Justify your answer.

3. Describe two advantages of a Distributed Version Control System (DVCS).

4. Describe a disadvantage associated with the use of mono-repositories.

5. Define (and differentiate) the following terms: Continuous Integration; Continuous Delivery; and Continuous Deployment.

6. Why are Continuous Integration, Continuous Delivery, and Continuous Deployment important practices in DevOps? In your answer, consider the definition of DevOps that you used in the first exercise in this list.

7. Search the meaning of the term CI Theater. Then, define it in your own words.

8. Suppose you were hired by a printer company, and you become responsible for defining the DevOps practices that will be adopted in the development of the printers’ drivers (software). Which of the following practices would you adopt in this case: Continuous Deployment or Continuous Delivery? Justify your answer.

9. Describe a problem (or challenge) that arises when we use feature flags to delimit code that is not ready to go into production.

10. Programming languages such as C support conditional compilation directives such as #ifdef and #endif. Search the functionality and usage of these directives. What is the difference between them and feature flags?

11. Which type of feature flags has a longer lifespan (i.e., stays in the code longer): release flags or business flags? Justify your answer.

12. When companies migrate to CI, they usually no longer use feature branches. Instead, they have a single branch, which is shared by all developers. This practice is called Trunk-Based Development (TBD), as we studied in this chapter. However, TBD does not mean that branches are no longer used in these companies. Thus, describe another usage for branches, which is not to implement new features.

13. Read the following article from the official Gmail blog, which describes a major interface update made in 2011. The article compares the challenges of this migration to those of changing the tires of a car while it is moving. Regarding this article, answer:

  1. Which technology that we studied in this chapter was fundamental to facilitating this update to the Gmail interface? How does the article refer to this technology?

  2. What term do we use in this chapter to reference it?