Book cover
All rights reserved. Version for personal use only.
This web version is subjected to minor edits. To report errors or typos, use this form.

Home | Dark Mode | Cite

Software Engineering: A Modern Approach

Marco Tulio Valente

7 Architecture

Architecture is about the important stuff. Whatever that is. – Ralph Johnson

This chapter begins with an introduction to the concept of software architecture. Following that, we explore several architectural patterns, including Layered Architectures and, specifically, Three-layer Architectures (Section 7.2), MVC (Section 7.3), and Microservices (Section 7.4). In the case of microservices, we explore the context that led to the emergence of this architectural pattern and discuss its benefits and challenges. Subsequently, we discuss two architectural patterns proposed to ensure scalability and decoupling in distributed systems: Message-Oriented Architectures (Section 7.5) and Publish/Subscribe (Section 7.6). The chapter concludes by examining other architectural patterns (Section 7.7) and by providing an example of an architectural anti-pattern (Section 7.8).

7.1 Introduction

There is more than one definition for software architecture. One of the most common definitions considers that architecture is concerned with high-level design. Thus, the focus shifts from the organization and interfaces of individual classes towards larger units, such as packages, components, modules, subsystems, layers, or services—the name is not so important. Generally, these terms should be understood as sets of related classes.

In addition to being larger, architectural components must be relevant and related to the key mission of a system. For instance, suppose you work on an information system. Certainly, this system includes a module to persist data, which uses a database. This module is crucial in information systems, as their main goal is to automate and persist information related to business processes. On the other hand, suppose now you work on a system that uses artificial intelligence techniques to diagnose diseases. This system also has a persistence module that stores data from the diseases that are diagnosed. However, this module is not relevant to the main purpose of the system. Therefore, it is not a key part of its architecture.

There is also a second definition for software architecture. As expressed in Ralph Johnson’s quote that opens this chapter, it considers that software architecture refers to the most important design decisions in a system. These decisions are so crucial that once made, they are hard to revert in the future. Hence, this second way of defining architecture is more general than the one we discussed in the previous paragraph. It considers that architecture is not just a set of modules but a set of decisions. Among these decisions, the definition of the main modules of the system is definitely included. However, other decisions are also contemplated, such as the choice of the programming language and the database used by the system. In fact, once a system is implemented using a certain database, it is very hard to migrate to another one. For this reason, even today we have critical systems that use non-relational databases and that are implemented in COBOL.

Architectural patterns propose a high-level organization for software systems, including their key modules and the relations between them. These relations define, for example, whether module A may (or may not) call the methods of module B. In this chapter, we are going to study the following architectural patterns: Layered Architecture (Section 7.2), Model-View-Controller or MVC Architecture (Section 7.3), Microservices (Section 7.4), Message-oriented Architecture (Section 7.5), and Publish/Subscribe Architecture (Section 7.6). To conclude, we will briefly present other architectural patterns, such as Pipes and Filters (Section 7.7). Additionally, we will provide an example of an architectural anti-pattern known as the Big Ball of Mud (Section 7.8).

In-Depth: Some authors—like Taylor et al. (link) —make a distinction between patterns and architectural styles. According to them, patterns focus on solutions to specific architectural problems, while architectural styles propose that the modules of a system should be organized in a certain way without necessarily aiming to solve a specific problem. Thus, for these authors, MVC is an architectural pattern that solves the problem of separating presentation and model in graphical interface systems. On the other hand, Pipes and Filters constitute an architectural style. In this chapter, however, we will not make this distinction. Instead, we will refer to all of them as architectural patterns.

7.1.1 Tanenbaum-Torvalds Debate

In early 1992, a heated debate over operating system architectures erupted in an internet discussion forum. Despite the participation of numerous developers and researchers in this discussion, it became known as the Tanenbaum-Torvalds Debate (link, Appendix A, page 102). Tanenbaum (Andrew) is a researcher in the field of operating systems, an author of textbooks in the area, and a professor at Vrije Universiteit in Amsterdam, the Netherlands. Torvalds (Linus) at the time was a Computer Science student at the University of Helsinki in Finland.

The discussion began when Tanenbaum posted a message in the forum titled Linux is obsolete. His main argument was that Linux follows a monolithic architecture, where all operating system functions, such as process management, memory management, and file systems, are implemented in a single executable file running in supervisor mode. Since then, Tanenbaum has advocated for a microkernel architecture, where the kernel handles only the most basic system functions, and the other ones run as independent processes outside the kernel. Linus responded emphatically, asserting that Linux was a practical operating systems at the time, while the microkernel-based system under development by Tanenbaum was facing various problems and bugs. The discussion continued, and Tanenbaum even stated that Torvalds was fortunate not to have been his student; otherwise, he would not have received a good grade for the monolithic Linux architecture. An interesting comment was also made by Ken Thompson, one of the designers of the first versions of Unix:

It is in my opinion easier to implement a monolithic kernel. It is also easier for it to turn into a mess in a hurry as it is modified.

Indeed, Thompson’s prediction proved accurate. In 2009, Linus made the following declaration during a conference:

We are definitely not the streamlined, small, hyper-efficient kernel that I envisioned 15 years ago. The kernel is huge and bloated… And whenever we add a new feature, it only gets worse.

This comment is available on a Wikipedia page (link) and was the subject of several articles on technology sites at the time. It highlights that architectural decisions are not only crucial and difficult to reverse but often take years for their negative effects to become apparent and start causing problems.

7.2 Layered Architecture

Layered architecture is one of the most commonly used architectural patterns since the first large software systems were designed in the 60s and 70s. In systems that follow this pattern, classes are organized into modules called layers. The layers are arranged hierarchically, resembling a cake. Consequently, a layer can only use services—meaning it can call methods, instantiate objects, extend classes, declare parameters, throw exceptions, etc.—from the layer immediately below it.

Among other applications, layered architectures are widely used in the implementation of network protocols. For instance, HTTP is an application protocol that uses services from a transport protocol, such as TCP. TCP, in turn, relies on services from a network protocol, like IP. Finally, the IP layer uses services from a communication protocol, for example, Ethernet.

A layered architecture partitions the complexity involved in implementing a system into smaller components, namely the layers. As a second advantage, it imposes discipline on the dependencies between these layers. As mentioned earlier, layer n can only use services from layer n-1. This hierarchy aids in understanding, maintaining, and evolving a system. For instance, it becomes easier to substitute one layer for another (e.g., transitioning from TCP to UDP). Additionally, it facilitates the reuse of a layer by upper layers. For example, the transport layer can be used by various application protocols such as HTTP, SMTP, DHCP, etc.

In-Depth: One of the early proposals for a layered architecture was developed by Edsger W. Dijkstra in 1968 for an operating system called THE (link). The layers proposed by Dijkstra were as follows: multiprogramming (layer 0), memory allocation (layer 1), interprocess communication (layer 2), input/output management (layer 3), and user programs (layer 4). Dijkstra concluded his article by emphasizing that the benefits of hierarchical structures are more critical with larger projects.

7.2.1 Three-Tier Architecture

This architectural pattern is common when building enterprise information systems. Until the late 80s, enterprise applications—such as payroll, inventory control, or accounting systems—were executed on mainframes, which were physically large and expensive computers. These applications were monolithic and accessed through terminals without any processing capacity and having only a textual interface. However, with advancements in network and hardware technologies, it became possible to migrate these systems to other platforms. That’s when three-tier architectures became an alternative.

The three layers of this architecture are as follows:

  • User Interface: Also known as the presentation layer, it is responsible for all user interaction, handling the display of data, and processing inputs and interface events such as button clicks and text highlighting. Usually, this layer is implemented as a desktop application. For example, an academic system should provide a graphical interface for instructors to enter grades for their classes. The main element of this interface can be a form with two columns: student name and grade. The code implementing this form resides in the interface layer.

  • Business Logic: Also known as the application layer, it implements the system’s business rules. In the academic system example, a business rule could require that grades must be greater than or equal to zero and less than or equal to the value of the exam. When an instructor enters the grades for a exam, it’s up to the logic layer to check whether this rule is followed. Another business rule could state that after the grades are entered, the students should be notified via email.

  • Database: This layer stores the data manipulated by the system. For example, in our academic system, after the grades are entered and validated, they are saved in a database.

As shown in the next figure, a three-tier architecture is a distributed architecture. This means that the interface layer runs on clients’ machines, the business layer runs on a server (often called an application server), and there is the database.

Three-tier architecture

In three-tier architectures, the application layer can have various modules, including a facade to facilitate system access for clients, and a persistence module with the function of isolating the database from the other modules.

Finally, it’s worth mentioning that it’s possible to have two-tier architectures. In this case, the interface and application layers are combined into a single layer, which runs on the client. The second layer is the database. The disadvantage of such architectures is that all processing occurs on the clients, which, therefore, must have more computational power.

7.3 MVC Architecture

The MVC (Model-View-Controller) architectural pattern was proposed in the late 70s and subsequently used in the implementation of Smalltalk-80, which is one of the earliest object-oriented languages. Besides incorporating object-oriented concepts, Smalltalk played a pioneering role in introducing Graphical User Interfaces (GUI) featuring windows, buttons, scroll bars, mice, and more. This occurred during an era when operating systems only had command-line interfaces, and programs primarily featured a textual interface. In other words, screens were a matrix of characters, often with 25 lines and 80 columns, for instance.

MVC was the architectural pattern chosen by Smalltalk designers for the implementation of graphical interfaces. Specifically, MVC defines that the classes of a system should be organized into three groups:

  • View: Classes responsible for implementing the system’s graphical interface, including windows, buttons, menus, scroll bars, etc.

  • Controllers: Classes that handle events generated by input devices, such as the mouse and keyboard. As a result of such events, the Controllers can request a change in the state of the Model or the View. For example, consider a calculator. When the user clicks on the + button, a Controller class should capture this event and call a method of the Model. As a second example, when the user clicks on the Dark UI button, it is also up to a Controller class to request the View to change the colors to darker shades.

  • Model: Classes that store the data manipulated by the application and that are related to the system’s domain. Thus, these classes have no knowledge or dependency on View and Controller classes. In addition to data, they can contain methods that perform operations on domain objects.

Therefore, in MVC architectures, the graphical interface is formed by View objects and Controllers. However, in many systems, there is no clear distinction between such components. According to Fowler (link, page 331), even the majority of Smalltalk implementations do not separate these two components. Thus, it’s easier to understand MVC in this way:

MVC = (View + Controllers) + Model = Graphical Interface + Model

The next figure shows the dependencies between the classes in an MVC architecture. The figure re-emphasizes that the graphical interface is composed of the View and Controllers. We can also observe that the Graphical Interface depend on the Model. However, the Model does not have dependencies on Graphical Interface classes. Indeed, we can understand the Graphical Interface as an observer of the Model. When the state of the Model changes, the graphical interface should be updated.

MVC Architecture

Among the advantages of MVC architectures, we can list:

  • MVC encourages the specialization of development work. For example, we can have developers who are experts in implementing user interfaces, nowadays called front-end developers. On the other hand, the developers responsible for the implementation of Model classes do not need to concern with user interface code.

  • MVC allows Model classes to be used by different Views, as illustrated in the next figure. In this example, a Model object stores two values: hours and minutes. These data are presented in two different Views. The first is an analog clock, and the second is a digital clock.

MVC system with two Views
  • MVC enhances testability. As we will study in Chapter 8, non-visual objects—that is, those not related to graphical interfaces implementation—are easier to test. Hence, by separating View from Model objects, it becomes easier to test the latter.

We conclude with a summary of MVC, according to Fowler and Beck (link, Chapter 12, page 370):

The gold at the heart of MVC is the separation between the user interface code (the view, these days often called the presentation) and the domain logic (the model). The presentation classes contain only the logic needed to deal with the user interface. Domain objects contain no visual code but all the business logic. This separates two complicated parts of the program into pieces that are easier to modify. It also allows multiple presentations of the same business logic.

Frequently Asked Questions

What is the difference between MVC and three-tier architectures? The answer is going to be a bit long, and we will base ourselves on the historical evolution of these architectures.

First, and as we mentioned, MVC emerged in the late 70s to support the construction of graphical interfaces, i.e., applications that include an interface with windows, buttons, text boxes, etc. An example is an office package, with applications such as Word, Excel, and PowerPoint.

In the 90s, network technologies, distributed systems, and databases became common. They enabled the construction of distributed applications with three tiers. In this case, MVC can be used in the implementation of the interface layer, which could, for example, be a Windows application. In essence, the application, as a whole, follows a three-tier architecture but uses MVC in the user interface layer.

At the beginning of the 2000s, the Web became widespread, and user interfaces migrated to HTML and later to HTML and JavaScript. The confusion between the terms MVC and three-tier arose at this time, primarily due to the appearance of frameworks for implementing web systems that called themselves MVC frameworks. Examples include Spring (for Java), Ruby on Rails, Django (for Python), and Laravel (for PHP). Actually, these frameworks adapted the MVC concepts for the Web. For example, they enforce the organization of a web app into three parts (see the next figure): View, composed of HTML pages; Controllers, which process a request and generate a new View as a response, and Model, which is the layer that persists the data in a database.

Web MVC Architecture

Consequently, although web systems are similar to three-tier systems, the most popular web frameworks decided to use typical MVC terms to name their components. Therefore, the best way to answer the question is to affirm that there are two versions of MVC: the traditional version, which came about with Smalltalk-80, and the Web version, which became common in the early 2000s. This last version greatly resembles three-tier architectures.

7.3.1 Example: Single Page Applications

In a traditional web application, with forms, menus, and buttons, every time the user generates an event—such as clicking on a Save button—an interaction between the browser and the web server occurs. That is, the browser sends data to the web server, which processes them and returns a new page to be displayed in the browser. These applications are, therefore, less interactive and responsive due to the delay in communication between the browser and the web server.

Recently, a new type of web system, called Single Page Applications (SPAs), has emerged. These applications are more similar to desktop applications than to traditional web applications. At loading time, SPAs load all their code into the browser, including HTML pages, CSS files, and JavaScript code. Therefore, even though the users are using a browser, they have the impression that they are accessing a local application, as the browser’s page does not reload every time they click on a button. Several modern applications are SPAs, including, for example, Gmail. Obviously, there is still a part of the application on the server with which the SPA frequently communicates. For example, when a new email arrives, Gmail updates the list of messages in the inbox. For this to occur automatically, the communication between the SPA and the server must be asynchronous.

There are several frameworks—all based on JavaScript—for implementing SPAs. Next, we show a simple code example using Vue.js.

<html>
<script src="https://unpkg.com/vue@2"></script>

<body>

<h3>A Simple SPA</h3>

<div id="ui">
  Temperature: {{ temperature }}
  <p><button v-on:click="incTemperature">Increment
  </button></p>
</div>

<script>
var model = new Vue({
  el: '#ui',
  data: {
    temperature: 60
  },
  methods: {
    incTemperature: function() {
      this.temperature++;
    }
  }
})
</script>

</body>
</html>

This application displays a temperature on the browser screen and a button to increment it (see figure below).

Example of Single-Page Application

Interestingly, SPAs follow an architecture similar to MVC. In the previous example, the SPA interface, including the View and Controller, is implemented in HTML, specifically in the code delimited by the <div> tag. The Model is implemented in JavaScript, using Vue.js, and is delimited by the <script> tag.

A second interesting point is that Vue.js is responsible for propagating changes in the Model to the View. For example, when the incTemperature is executed, the temperature value is automatically updated in the interface. The reverse process can also occur, although not exercised in our simple example. This feature of SPA frameworks is called two-way data binding.

7.4 Microservices

As we discussed in Chapter 2, agile methods advocate for rapid iterations with frequent releases in order to obtain feedback and, if necessary, make changes in a software product. However, even if an organization adopts an agile method—such as Scrum—it will face a challenge when it comes to releasing new versions of a software.

This challenge occurs because systems typically follow a monolithic architecture at runtime. Meaning, even if the development is decomposed into modules M1, M2, M3, …, Mn, at runtime, these modules are executed by the operating system as a single process. Consequently, all modules share the same addressing space. In other words, during runtime, the system is a large monolith, as illustrated in the next figure.

Monolith with nine modules. At runtime, the system executes as a single process, represented by the outer square around the modules.

In a monolith, there’s always a risk that a change made by a team in a module Mi causes a bug in a module Mj. For example, Mi and Mj may share a global variable or a static attribute. Thus, a change in this variable, made in Mi, may compromise the behavior of Mj.

To prevent customers from being surprised with unexpected bugs in their systems, organizations using monolithic architectures adopt a strict and bureaucratic process for releasing new versions. This process may even include manual testing before the system is released to production. By manual tests, we mean a tester executing the system’s most relevant functionalities to simulate a usage session by an end user.

To solve this problem—where development has become agile but the release process remains bureaucratic—organizations have begun to migrate their monoliths to an architecture based on microservices. The idea is simple: groups of modules will now run as separate processes, without sharing memory. In other words, the system is decomposed into modules not just during development but also at runtime. With this, the chances of changes in one module causing bugs in other modules are reduced.

When the modules are separated into distinct processes, they cannot access an internal resource of another module, such as a global variable, a static attribute, or an internal interface. Instead, by design, all communication must occur through the public interfaces of the modules. In other words, microservices are used to ensure that development teams only use the public interfaces of the systems they depend on. Compliance with this rule is ensured by the operating system.

The next figure shows a microservices-based implementation of our initial example. In this new architecture, we still have nine modules. But they are run by six separate processes, represented by the squares or rectangles around the modules. Modules M1, M2, M3, and M6 are executed, each one, in an independent process. Modules M4 and M5 are executed in a fifth process. Lastly, modules M7, M8, and M9 are executed in another process.

Architecture with six microservices: M1, M2, M3, M4-M5, M6, M7-M8-M9. Each microservice runs as an autonomous process.

Up to this point, we’ve used the term process, but the pattern name refers to them as services. Also, the services are micro because they do not implement complex features. Remember that they are implemented and maintained by agile teams, which, as we mentioned in Chapter 2, are small. Consequently, small teams don’t have the capacity to implement large services with multiple features.

A second advantage of microservices is scalability. When a monolith faces performance issues, one solution is to replicate the system on different machines, as shown in the next figure. This solution is called horizontal scalability. For example, it allows the users to be divided among the two instances presented in the figure. Since they are a monolith, both instances are identical, that is, they have exactly the same modules.

Server 1 running a monolith in a single process. And Server 2, running a replica of this monolith.

However, the performance issues may be caused by specific services; for example, by the authentication service. Thus, microservices allow the specific components related to such performance issues to be replicated. The next figure shows a new deployment of our microservices-based system.

Server 1 runs all the microservices, except M1. Server 2 runs six processes, all of them implementing M1.

In this figure, the second server includes only instances of M1. The assumption is that M1 is responsible for most of the performance problems of the initial installation. In the first installation, we have a single instance of M1. Now, we have six instances, all of them running on server 2.

So far, we listed two advantages of microservices: (1) they allow a system to evolve faster and independently, allowing each team to adopt its own release schedule; (2) they support scalability at a finer granularity level than what is possible with monoliths. But there are at least two other advantages:

  • Since microservices are autonomous and independent, they can be implemented in different technologies, including programming languages, frameworks, and databases. In an e-commerce system, for example, the customer registration microservice can be implemented in Java with a relational database. Meanwhile, the microservice that provides purchase recommendations can be implemented in Python with a NoSQL database.

  • When using a monolith, failures are total. If the database crashes, all services go down. On the other hand, in microservice-based architectures, we can have partial failures. For example, suppose that the recommendation microservice in our e-commerce system is down. Customers will still be able to search for products, make purchases, etc. But they will receive a message in the recommendation area of the page saying that recommendations are not available at the moment.

Microservice-based architectures have become popular due to the emergence of cloud computing platforms. With these platforms, organizations no longer need to purchase and maintain hardware and basic software, such as operating systems, databases, and web servers. Instead, they can rent a virtual machine on a cloud platform and pay per hour of machine usage. This makes it easier to scale a microservice horizontally by adding new virtual machines.

In-Depth: Microservices are an example of the application of Conway’s Law. Formulated in 1968 by Melvin Conway, it is one of the empirical laws on Software Engineering, much like Brooks’ Law, which we studied in Chapter 1. Conway’s Law suggests the following: companies tend to adopt software architectures that mirror their organizational structures. In other words, a company’s software architecture tends to reflect its organizational chart. That’s why it’s not a coincidence that microservices are primarily used by large internet companies that have hundreds of small development teams distributed across various countries. Aside from being decentralized, these teams are autonomous and are constantly encouraged to produce innovations.

7.4.1 Data Management

Ideally, microservices should also be autonomous in terms of data. That is, they should manage the data associated to their service. Hence, the scenario illustrated by the following figure—with two microservices sharing the same database—isn’t recommended in a microservice-based architecture.

Ideally, M1 and M2 should be independent even from the standpoint of databases, as shown in the next figure. The reason is that when you have a single database, it too can become a bottleneck to the system’s evolution.

For example, traditional development teams and architectures often share a database administrator, who is responsible for managing the database model. Any change to the database—like creating a new column in a table—needs approval from this administrator. Therefore, that central authority has to reconcile the interests, often conflicting ones, of the different teams. Consequently, their decisions might become slow and bureaucratic, delaying the system’s evolution.

7.4.2 When Not to Use Microservices?

Up until this point, we’ve presented the advantages of microservices. But it’s important to note that this architecture is more complex than a monolithic one. The reason is that microservices are independent processes, which results in a distributed system. Thus, when using microservices, we face the challenges that characterize such systems. Among them, we can mention:

  • Complexity: When two modules run in the same process, communication between them occurs through method calls. But when they are on different machines, the communication must use some communication protocol, such as HTTP. For this reason, developers who work with microservices need to master and use a set of technologies for communication over networks.

  • Latency: The communication between microservices also involves more delay, which we call latency. When a client calls a method in a monolithic system, the latency is minimal. For example, it is rare for a developer to avoid calling a method to improve the performance of their code. However, this scenario changes when the method is on another machine, perhaps on the other side of the planet in the case of a global company. In these situations, there is a non-negligible communication cost. Regardless of the communication protocol used, the call has to pass through the network cable—or through the air and the optic fiber—until it reaches the destination machine.

  • Distributed Transactions: As we’ve seen, microservices should be autonomous in terms of data as well. However, this makes it more complex to ensure that operations performed on two or more databases are atomic, that is, either they execute successfully in all databases or they fail. For example, let’s assume two credit card payment microservices, which we’ll call X and Y. Assume that an e-commerce site allows the purchase value to be split between these cards. For example, a $500.00 purchase may be paid by debiting $300.00 on card X and $200.00 on card Y. However, these transactions should be atomic: either both cards are debited, or none of them are. Therefore, in microservice-based architectures, distributed transaction protocols, like two-phase commit, may be necessary to guarantee transaction semantics in operations that write on more than one database.

7.5 Message-Oriented Architectures

In this type of architecture, communication between clients and servers is mediated by a third-party service that implements a message queue, as the next figure shows.

Message Oriented architecture

Clients act as message producers; that is, they insert messages into the queue. And servers act as message consumers; that is, they retrieve messages from the queue and process the information contained in them. A message is a record (or an object) with a set of values. And a message queue is a FIFO-type structure (first in, first out), meaning the first message to enter the queue is the first to be processed by the server.

By using message queues, communication becomes asynchronous, as once the information is placed in the queue, the client is free to continue its processing. Therefore, it is important that the messaging service is provided by a stable machine with high processing power. It is also important for the message queue to be persistent. If the queue goes down, the data must not be lost. As message queues are widely used in the implementation of distributed systems, there are ready-made solutions in the market. That is, you probably won’t implement your own message queue but reuse solutions from well-known companies or those maintained by open source communities. Sometimes, message queues are also called message brokers.

In addition to enabling asynchronous communication between clients and servers, message queues enable two forms of decoupling among the components of a distributed application:

  • Space decoupling: Clients do not need to know the servers, and vice versa. In other words, the client is an information producer but does not need to know who will consume this information. The reverse reasoning applies to servers.

  • Time decoupling: Clients and servers do not need to be simultaneously available to communicate. If the server is down, clients can continue producing messages and placing them in the queue. When the server comes back, it will process these messages.

Space decoupling makes solutions based on message queues quite flexible. Development teams—both for the client and the server—can work and evolve their systems autonomously. Delays from one team do not affect the work of other teams, for instance. For this, it is sufficient that the message format remains stable over time. On the other hand, time decoupling makes the architecture robust to failures. For example, server failures do not have an impact on clients. However, it is essential that the message broker is stable and capable of storing a large number of messages, as we said before. To ensure the availability of these brokers,they are usually managed by specialized infrastructure teams.

Message queues also allow to scale a distributed system more easily. To do this, we just need to configure multiple servers consuming messages from the same queue, as the next figure shows.

Message queue with multiple servers

7.5.1 Example: Telecommunication Company

Suppose a telecommunication company has two main systems: customers and engineering. The customers system is responsible for interacting with the company’s customers, for example, to sell internet packages. On the other hand, the engineering system is responsible for activating and configuring the services that have been sold. This involves configuring the hardware of the company, such as routers. Therefore, when a new customer purchases a service, it has to be provisioned in the engineering system.

This company may use a message queue to mediate the communication between both systems. Upon selling a new package of services, the customers system places a message in the queue containing the package information. It is then the responsibility of the engineering system to process this message and activate the new customer.

When opting for a message queue architecture, the communication between the systems does not occur in real time. For example, if the engineering system is busy with several complex service activations, it may take a while until a certain service is activated. On the other hand, a message queue solution allows services to be activated more quickly than by a batch solution. In this type of solution, the customers system generates a file with the services sold each day. This file is processed during the night by the engineering system. Therefore, a customer might have to wait 24 hours to have their service activated.

7.6 Publish/Subscribe Architecture

In publish/subscribe architectures, messages are referred to as events. The components of the architecture are known as publishers and subscribers. Publishers produce events and publish them in the publish/subscribe service, which is usually executed on a separate machine. Subscribers should subscribe to events of their interest. When an event is published, its subscribers are notified, as shown in the next figure.

Publish/Subscribe architecture

Just like when using message queues, publish/subscribe architectures also provide space and time decoupling. However, there are two major differences between publish/subscribe and message queues:

  • In publish/subscribe, an event generates notifications to all subscribers. On the other hand, in message queues, the messages are consumed, i.e., removed from the queue, by a single server. Therefore, in publish/subscribe, we have a style of communication from 1 to n, also known as group communication. In message queues, the communication is 1 to 1, also called point-to-point communication.

  • In publish/subscribe, subscribers are asynchronously notified. First, they subscribe to certain events and then continue their processing. When the events of interest occur, they are notified through the execution of a specific method. On the other hand, when using a message queue, the servers have to pull the messages from the queue.

In publish/subscribe systems, events are organized into topics, which function as event categories. When a publisher produces an event, it must specify its topic. This allows clients to subscribe to events of a particular topic.

Publish/subscribe architectures are sometimes referred to as event-oriented architectures. The publish/subscribe service is also sometimes called an event broker. It’s also important to mention that a publish/subscribe architecture shares similarities with the Observer design pattern, as we studied in Chapter 6. However, publish/subscribe is an architectural solution for implementing distributed systems. In this context, producers and subscribers are different processes and, most of the time, located in distinct machines. On the other hand, the Observer design pattern was not proposed for use in distributed architectures.

7.6.1 Example: Airline Company

Let’s illustrate a publish/subscribe architecture using the systems of an airline as an example. Consider this company has a sales system used by customers to purchase airline tickets. After completing a sale, the system generates an event containing all transaction data, including date, time, flight number, and passenger name. The following figure illustrates the proposed architecture for the system.

Pub/Sub architecture in an airline

Three systems of the airline subscribe to the sale event: (1) the mileage system, as the miles related to the ticket should be credited to the passenger’s account; (2) the marketing system, which can use the sale data to make offers to customers, such as car rentals or upgrades to business class; (3) the accounting system, as the sale must be included in the company’s accounting.

This architecture has the following interesting characteristics: (1) group communication because the same event is subscribed to by three systems; (2) space decoupling because the sales system does not know which systems are interested in the events it publishes; (3) time decoupling because the publish/subscribe system resends the events if the subscribing systems are down; (4) asynchronous notification because the subscribers do not need to periodically query the publish/subscribe system about the events of interest.

7.7 Other Architectural Patterns

Pipes and Filters is a data-oriented architectural pattern in which programs—called filters—process the data received on the input and generate a new output. Filters are connected through pipes, which act as buffers, storing the output data until it is consumed by the next filter in the sequence. This way, filters don’t know their predecessors and successors, making this architecture flexible and allowing various program combinations. Additionally, filters can be executed in parallel. A classic example of a pipe and filter-based architecture is Unix system commands. For instance:

ls | grep csv | sort

requests the execution of three commands (filters) that are connected by two pipes (vertical bars). In the case of Unix commands, inputs and outputs are text files.

Client/Server is a very common architecture when implementing network services. Clients and servers are the only possible modules in this architecture, and they communicate through a network. Clients request services from the servers and wait for processing. Client/Server architectures are used to implement services like the following: (1) print service, which enables clients to print to a remote printer that is not physically connected to their machine; (2) file service, which enables clients to access the file system (that is, the disk) from a server machine; (3) database service, which allows clients to access a database located on another machine; (4) Web service, which allows clients (in this case, browsers) to access resources (in this case, HTML pages) provided by a web server.

Peer-to-peer architectures are distributed architectures in which each module can play both the client and the server role. In other words, these modules—called peers—are both consumers and service providers. For example, BitTorrent is a peer-to-peer protocol for sharing files on the Internet. Applications that implement the protocol can both provide files to the network and download files from other peers.

7.8 Architectural Anti-Patterns

Let’s conclude this chapter with a description of an architectural anti-pattern, that is, an architectural organization that is not recommended. Perhaps the most known anti-pattern is called Big Ball of Mud. This anti-pattern, as proposed by Brian Foote and Joseph Yoder, describes systems in which a module can communicate with any other module, as the next figure suggests. That is, a Big Ball of Mud does not have a defined architecture. Instead, there is an explosion in the number of dependencies, which gives rise to a code spaghetti. Consequently, maintenance and evolution becomes very difficult and risky.

Big Ball of Mud anti-pattern

Real World: In an article published in 2009 (link), Santonu Sarkar and five colleagues—at the time consultants at the Indian company InfoSys—describe an experience of modularization of a large banking system. The system was implemented in the late 90s and since then it had grown 10-fold: from 2.5 million to more than 25 million lines of code. According to the authors, the development teams had several hundred engineers. Although the authors did not use the term, the article characterizes the architecture of this banking system as a Big Ball of Mud. For example, the authors mention that a single sources directory contained almost 15 thousand files. Next, they analyze the problems of maintaining this system: (1) the learning time of new engineers was only increasing, going from three to seven months in a five-year span; (2) frequently, bug fixes introduced new bugs into the code; (3) the time to implement new features, even simple ones, was also increasing considerably.

It may seem systems like the one described in this article are exceptions. However, they are more common than we might imagine. And the root of the problem lies in transforming the code into a Big Ball of Mud. Interestingly, the bank tried to workaround this problem by adopting practices such as detailed documentation, code reviews, and pair programming. However, all of them were incapable of fixing the problems caused by the Big Ball of Mud architecture.

Bibliography

James Lewis, Martin Fowler. Microservices: a definition of this new architectural term. Blog post, 2014.

Martin Fowler. Patterns of Enterprise Application Architecture, Addison-Wesley, 2002.

Martin Fowler. Who Needs an Architect, IEEE Software, vol. 20, issue 5, p. 11-13, 2003.

Patrick Eugster et al. The many faces of publish/subscribe. ACM Computing Surveys, vol. 35, issue 2, p. 114-131, 2003.

Glenn Krasner, Stephen Pope. A cookbook for using the model-view controller user interface paradigm in Smalltalk-80. Journal of Object-Oriented Programming, vol. 1, issue 3, p. 26-49, 1988.

Kevlin Henney, Frank Buschmann, Douglas Schmidt. Pattern-Oriented Software Architecture: A Pattern Language for Distributed Computing, vol. 4, John Wiley & Sons, 2007.

Exercises

1. Given their complexity, database systems are relevant components in the architecture of any type of system. True or false? Justify your answer.

2. Describe three advantages of MVC architectures.

3. What is the difference between the Controller classes in a traditional MVC architecture and the Controller classes in a web system implemented using an MVC framework like Ruby on Rails?

4. Describe four advantages of microservices.

5. Why aren’t microservices a silver bullet? That is, describe at least three disadvantages of using microservices.

6. Explain the relationship between Conway’s Law and microservices.

7. Explain what space and time decoupling mean. Why do message queues and publish/subscribe architectures offer these forms of decoupling?

8. When should a company consider using message queues or a publish/subscribe architecture?

9. Explain the concept of topics in a publish/subscribe architecture.