Book cover

Buy e-book on Leanpub

To report errors or typos, use this form.

Home | Dark Mode | Cite

Software Engineering: A Modern Approach

Marco Tulio Valente

4 Models

All models are wrong, but some models are useful. So the question you need to ask is not Is the model true? (it never is) but Is the model good enough for this particular application? – George Box

This chapter starts with a presentation about software models (Section 4.1). Then, we provide an overview of UML, which is the most widely used graphical notation for building software models (Section 4.2). We also make it clear that we will study UML for creating software sketches, rather than detailed technical blueprints. Afterwards, we explore four UML diagrams: Class Diagrams (Section 4.3), Package Diagrams (Section 4.4), Sequence Diagrams (Section 4.5), and Activity Diagrams (Section 4.6).

4.1 Introduction

As we discussed in the previous chapter, requirements document what a system should do, using a level of abstraction close to the problem and to the users. Conversely, the source code offers a concrete, low-level, and executable representation of the system’s behavior. Thus, there is a gap between these two worlds: requirements and source code. Software engineers have attempted to bridge this gap since the inception of the field by creating models. Essentially, software models aim to simplify the understanding and analysis of a system. For this reason, they offer more details than requirement specifications but are also less complex than the system’s code.

Models are also instrumental in other engineering fields. For example, a civil engineer might create a scale model to demonstrate how the bridge she was hired to build will look. She could then simulate and verify properties of the bridge, such as maximum load, resistance to wind, waves, earthquakes, etc., by creating a mathematical and physical model of it.

Unfortunately, software models—at least to date—have had less impact than the mathematical models widely used in other engineering fields. The reason is that by discarding crucial details, they also eliminate part of the complexity that is essential to the system being modeled. Frederick Brooks comments on this issue in his seminal essay No Silver Bullet (link):

The complexity of software is an essential property, not an accidental one. Hence, descriptions of a software entity that abstract away its complexity often abstract away its essence. For three centuries, mathematics and the physical sciences made great strides by constructing simplified models of complex phenomena, deriving properties from the models, and verifying those properties by experiment. This paradigm worked because the complexities ignored in the models were not the essential properties of the phenomena. It does not work when the complexities are the essence.

The quote that opens this chapter from British statistician George Box is also a reflection on the practical use of models. Although he was probably referring to mathematical models, his insight applies to all forms of models, including software models. According to Box, all models are wrong because they are simplifications or approximations of reality. Therefore, the main issue is assessing whether, despite these simplifications, a model retains its value for studying properties of the object or phenomenon it mimics.

Thus, our first goal in this chapter is to set accurate expectations regarding the study of software models. In particular, models can play a significant role in software design. During requirements specification, the focus is primarily on defining the problem the system will solve. When we move to design activities, attention shifts towards modeling a solution capable of solving it. After this solution is designed, it must be implemented using programming languages, libraries, frameworks, databases, etc.

Specifically, in this chapter, we will study a subset of the diagrams proposed by UML (Unified Modeling Language). We will begin by describing the history and context that led to the creation of this modeling language. Next, we will study some of the most important UML diagrams.

In-Depth: Since the 1970s, researchers have investigated the use of mathematical models in Software Engineering through so-called Formal Methods. These methods use a mathematical notation—based on logic, set theory, or Petri Nets, for example—to derive formal specifications for software systems. These specifications are precise and unambiguous, and can be used to demonstrate properties of a system before it is implemented. For example, it is possible to prove that a concurrent system doesn’t have deadlocks or race conditions. This may seem ambitious, but it is common in other engineering fields. For example, civil engineers have for centuries used mathematical models to assure, before construction, that a bridge will support a certain load and resist specific weather conditions. However, the use of mathematical specifications in Software Engineering has not advanced as in other Engineering fields. Consequently, their use today is primarily restricted to critical mission systems.

4.2 UML for Creating Sketches

The Unified Modeling Language (UML) is a graphical notation for software modeling. The language defines a collection of diagrams to assist in the design of software systems, particularly object-oriented ones. The origins of UML date back to the 1980s, when the object-oriented paradigm was maturing and evolving rapidly. At this time, several object-oriented languages emerged, such as C++, as well as notations for software modeling. The Waterfall was the dominant process at this time, requiring a big and upfront design phase and the creation of several documents and models, which would be passed on to the programmers to be converted into code.

UML is the result of a combined effort to unify the graphical notations that developers used in the early 1990s. Specifically, the first version of UML was proposed in 1995 to unify notations being developed independently by three software engineers: Grady Booch, Jim Rumbaugh, and Ivar Jacobson. Simultaneously, tools emerged to create and edit UML diagrams, which we referred to as CASE tools (Computer-Aided Software Engineering). The name is inspired by CAD tools (Computer-Aided Design), widely used in traditional engineering fields. The standardization proposed by UML was important to ensure developers could create, access, and edit their models using various CASE tools. Indeed, in 1997, UML became a standard managed by OMG, which is an organization funded by software industries.

How to use UML?

Martin Fowler classifies three ways to use UML: as blueprint, as programming language, or as sketch (link). Let’s take a look at each.

UML as blueprint corresponds to the use of UML envisioned by its creators back in the 1990s. In this context, after the requirements specification phase, a set of models—or blueprints—used to be produced to document various aspects of a system. Typically, these models were created by analysts using CASE tools and then handed over to programmers for coding. Thus, UML as blueprint is recommended when using processes like the Rational Unified Process (RUP). Actually, RUP was proposed by software engineers with a strong connection to UML. However, as we discussed in Chapter 2, the use of UML to construct detailed and complete models is increasingly rare. For example, with agile methods, we do not have an upfront and long design phase. Instead, design decisions are proposed and implemented throughout the sprints. Consequently, we will not focus here on using UML as blueprint.

UML as programming language corresponds to the use of UML advocated by OMG after the standardization of the language. Essentially, the goal was to automatically generate code from UML models. This approach is also known as Model-Driven Development (MDD). In an attempt to make MDD viable, UML was expanded and new features and diagrams were introduced in the language. Consequently, the language gained a reputation for being heavy and complex. However, even after the addition of this extra complexity, the use of UML for code generation is not common, at least in the vast majority of systems.

Then, we have the third usage scenario, UML as sketch, which corresponds to the scenario we will study in this book. In this case, we use UML to build light and informal diagrams of parts of a system, hence the name sketch. These diagrams are used for communication between developers, in two main situations:

In both situations, the aim is not to produce detailed and complete models. Thus, there is no need for expensive tools, such as CASE tools. Moreover, the sketches are not used as input for code generation tools. Most of the time, the diagrams are drawn on a board and, later, photographed and erased. Additionally, only a small subset of UML diagrams is used.

As sketches are small and informal, one may question the need for a modeling language in this case. However, it is better to use a notation that has existed for years and that many developers known than to invent one’s own notation.

Specifically, using UML as sketch avoids two extremes. On the one hand, it avoids the rigid, detailed, and systematic use of UML. On the other hand, it avoids the use of an informal notation, whose semantics may not be clear to the team members. Additionally, UML is often used in books, tutorials, and documents to explain design techniques. For example, in Chapter 6, we will use UML to illustrate the mechanics of some design patterns. Thus, if the reader has never had contact with UML, they may have difficulty understanding such diagrams.

To conclude, software models and UML diagrams are used for communication among developers. They are written by developers and for developers. This is distinct from the goal of requirements documents, such as use cases, which are written by developers but they also can be read and verified by the users of the system under development.

Real World: In 2013, Sebastian Baltes and Stephan Diehl—both researchers at the University of Trier, in Germany—asked 394 developers to complete a survey on the usage of sketches in software design activities (link). The developers were spread over 32 countries, though the majority were from Germany (54%). The analysis of their responses provides interesting results about the use of sketches in software projects, as described below:

These results highlight the common use of sketches by developers, with nearly half including UML elements, which also reinforces the relevance of studying UML.

UML Diagrams

UML diagrams are divided into two major categories:

To better distinguish these categories, static diagrams deal with information that is directly available from the analysis of the code. This information is static because it does not change unless changes are made to the code. Dynamic diagrams, on the other hand, provide a runtime view. They are dynamic because it is common to have different execution flows of the same code. For example, users may run the program with different inputs, select different menu items, etc. In short, if you are interested in representing the structure of the code, you should use static diagrams. If you want to represent the behavior of a program—that is, what can happen during its execution, which methods are actually executed, etc.—you should use a UML dynamic diagram. Lastly, we would like to remind that we already studied Use Case Diagrams, which are considered a dynamic diagram, in Chapter 3 when we introduced techniques for requirements specification.

Notice: There are many versions of UML. For the rest of this chapter we will use the UML version described in the 3rd edition of the book UML Distilled, by Martin Fowler (link). This book was one of the first to discuss the use of UML as sketches. Specifically, we will study a subset of the diagrams from UML version 2.0. In addition to covering only four diagrams, we will not present every feature of each. Our challenge when writing this chapter was to select the 20% (or less) of UML features that represents 80% (or more) of its use today when drawing sketches. As an illustration of UML’s complexity, the specification of the version 2.5.1 of the language has 796 pages.

4.3 Class Diagrams

Class diagrams are the most common UML diagram. They provide a visual representation of a set of classes, offering information about attributes, methods, and relationships that exist among such classes. A class diagram is drawn using rectangles and arrows. Each class is represented by a rectangle with three compartments, as shown in the following figure. These compartments contain the class name (usually in bold), attributes, and methods.

The following diagram presents a real example with classes Person and Phone.

In this diagram, Person is a class with three attributes—firstName, lastName, and phone—and two methods—setPerson and toString. The three attributes are private, as indicated by the - symbol before each one. We also inform the type of each attribute. In turn, the two methods are public, as indicated by the + symbol. The diagram has a second class, called Phone, with three private attributes—code, number, and mobile—and three public methods—setPhone, toString, and isMobile. In the case of the methods, we also provide the name of their parameters and the return type.

However, this diagram leaves the impression that the classes are islands without communication among them. However, one of the main objectives of class diagrams is to visually show the relationships that exist among the classes of a system. Therefore, they can also include arrows to represent three types of relationships: association, inheritance, and dependency. We will describe each of them in the following sections.

4.3.1 Associations

When class A has an attribute b of type B, we say there is an association from A to B, which is represented by an arrow, also from A to B. At the end of the arrow, we indicate the name of the attribute responsible for the association—in our case, b. See the next example (in this example, we only show the information that interests us; therefore, the compartments of attributes and methods are empty):

To make the example clearer, we also show the code for classes A and B:

class A {
   private B b;

class B {

Therefore, using associations, we can transform the diagram with the isolated classes Person and Phone into the following one:

The two versions of the diagram are semantically identical. The difference is that in the first version, the classes appear isolated. By contrast, in the second version, it becomes visually clear that there is an association from Person to Phone. To further clarify, in both diagrams, Person has an attribute phone of type Phone. However, in the first version, this attribute is represented in the attribute compartment of the class Person. In the second version, it is represented outside of that compartment—specifically, at the end of the arrow linking Person to Phone. The objective is to emphasize that the attribute belongs to Person, but it points towards a Phone.

Associations often include multiplicity information, which indicates how many objects can be associated with the attribute responsible for the association. The most common multiplicity includes the following: 1 (exactly one object), 0..1 (zero or one object), and * (zero or more objects).

In the following example, the multiplicity of the association between Person and Phone is 0..1. This information is indicated above the name of the attribute responsible for the association, in this case, phone. It means that a Person can have zero or only one phone. In programming terms, the attribute phone of Person can have the value null, indicating that the Person in question might not have a Phone. Alternatively, it can refer to just one Phone object.

In the next example, the semantics are different. In this case, a Person can be associated with multiple Phone objects, even none. This multiplicity is represented by the * above the arrow of the association.

Thus, in this last example, the attribute phone is an array, as illustrated in the following code:

class Person {
   private Phone[] phone;

class Phone {

The reader might wonder whether a Person should have a maximum of one Phone (0..1) or any number of Phone (*). The answer is simple: it depends on the requirements of the system. Thus, the customers are the ones who should answer this question. For us, what matters is that class diagrams can model both scenarios.

In some cases, multiplicity is also shown at the opposite end of the arrow, as in the next example.

In this diagram, there is a second multiplicity at the opposite arrow end, denoted by the symbol *, indicating that a Phone can be associated with more than one Person. In other words, two distinct people can share the same Phone. However, the association remains unidirectional, since Phone does not have an attribute storing the Person instances it refers to. For this reason, given a Person, we can easily retrieve their Phone by accessing the attribute phone. However, given a Phone, it is not possible to determine, by directly accessing an attribute, which Person instances it refers to.

To conclude, suppose it is important to navigate both ways in the association—from Person to Phone and also from Phone to Person. The solution to this problem is to use a bidirectional association by adding an arrow at each end of the line connecting the classes, as illustrated in the next diagram.

To clarify the semantics of bidirectional associations, we also show the code for both classes:

class Person {
   private Phone phone;

class Phone {
   private Person[] owner;

In this code, Person has a private attribute phone of type Phone, which could be null; thereby, we comply with the 0..1 end of the bidirectional association. Furthermore, Phone has a private array, named owner, that references Person objects; thus, we also comply with the * end of the same association.

In the last class diagram, we omitted all visibility symbols, both public (+) and private (-). This was done on purpose to emphasize that we are using UML for creating sketches. Therefore, the diagrams do not need to be syntactically perfect. Minor mistakes or omissions are tolerated, especially when they do not affect the purpose of the diagram.

In-Depth: UML—depending on the version in use–admits different notations for associations. For example, sometimes, one provides a name for the association, which is shown along the arrow that connects the two classes. Other times, in the case of bidirectional associations, the two arrows are omitted since UML standardization defines the following: an association in which neither end is marked with a navigability arrow is navigable in both directions. However, these alternative notations tend to be confusing or even ambiguous. For instance, Gonzalo Génova and two researchers from the University of Madrid, Spain, made the following observation about bidirectional associations without arrows: Unfortunately, this leads to an ambiguity in the graphical notation, because we cannot distinguish between bidirectional associations and associations with unspecified navigability. Or, worse, unspecified associations are assumed to be bidirectional without further analysis (link, Section 3, fourth paragraph). There are two other concepts frequently mentioned when we study associations in UML: composition and aggregation. In a composition, the destination class cannot exist independently from the source class. On the other hand, when the two classes have independent life cycles, we have an aggregation. In practice, these concepts also generate confusion, and that’s why we did not explore them in this chapter. The same opinion is shared by other authors. For example, Fowler states that aggregation is strictly meaningless; so, I recommend that you ignore this concept in your diagrams (link, page 68).

4.3.2 Inheritance

In class diagrams, inheritance is represented using arrows with an unfilled end. These arrows connect subclasses to their base class. In the following diagram, for example, Student and Teacher are subclasses of Person. As usual in object-oriented programming, subclasses inherit all the attributes and methods from the base class but can also add new ones. For example, Student has a course attribute and Teacher has degree.

4.3.3 Dependencies

There is a dependency from class A to class B, represented by a dashed line from A to B, when class A uses class B, but this use is not through an association (i.e., A does not have an attribute of type B) or inheritance (i.e., A is not a subclass of B). Dependencies occur, for example, when a method in A has a parameter or local variable of type B or when a method in A throws an exception of type B. A dependency is a weaker form of a relationship between classes than associations and inheritance.

To illustrate the use of dependencies, consider the following code:

import java.util.Stack;

class MyClass {
   private void methodX() {
     Stack stack = new Stack();

We can see that methodX of MyClass has a local variable of type java.util.Stack. In this case, we say that there is a dependency from MyClass to java.util.Stack, which is modeled in the following way:

Sometimes, along the dashed arrow, we provide information about the type of the dependency, using terms such as create (to indicate that the source class creates objects of the target class) or call(to indicate that the source class calls methods from the target class). These words are written between << and >> signs. In the following diagram, for example, ShapeFactory is a class that creates Shape objects.

A class can have many dependencies. For this reason, it is uncommon to represent all of them in class diagrams. Instead, we only show the most important dependencies and those directly related to the design decisions we intend to model.

4.4 Package Diagrams

Package diagrams are recommended when we want to provide a higher-level view of a system, showing only groups of classes—that is, packages—and the relevant dependencies between them. For this purpose, UML defines a special rectangle to represent packages, as shown below:

Unlike class rectangles, the package rectangle includes only the package name (in bold). It also has a detail at the top, in the form of a trapezoid, to distinguish it from class rectangles.

The following figure shows an example of a package diagram. It refers to a system with four main packages: MobileView, WebView, BusinessLayer, and Persistence. We can also see the dependencies—dashed arrows—that exist between them. Both View packages use classes from BusinessLayer. Classes from BusinessLayer also use classes from the View, for example, to notify them of some events. Therefore, the arrows connecting MobileView and WebView to BusinessLayer are bidirectional. Finally, only BusinessLayer uses the Persistence package.

To conclude, we would like to add two observations:

4.5 Sequence Diagrams

Sequence diagrams are dynamic diagrams, also known as behavioral diagrams. Therefore, instead of modeling classes, they model objects within a system. Additionally, they provide information about the methods that are executed in a specific usage scenario of a program. Thus, they are recommended when we have to explain the behavior of a system in a given situation. For example, at the end of this section, we will show a sequence diagram that illustrates the methods called when a client comes to an ATM and requests a deposit operation.

To kick off the presentation of sequence diagrams, let’s look at the next diagram. Although simple, it illustrates the notation used in such diagrams. As we mentioned, the diagrams model objects, which are represented as rectangles arranged at the top row. In our example, two objects are represented, named a1 and b1. Below each object, a vertical line is drawn in two forms: (1) when it is a dashed line, the object is inactive, i.e., none of its methods are being executed; (2) when the line changes to a rectangular shape, one of the object’s methods has been called and is currently under execution. When the execution ends, the line returns to the dashed form. Furthermore, a horizontal arrow marks the execution start. The return of the call is indicated by a dashed arrow, labeled with the name of the returned object. However, occasionally this arrow is omitted, such as in the case of the g method call. Two reasons justify this omission: the return type is void; or the returned object is not relevant to be represented.

In the previous example, only two objects (a1 and b1) are represented, but a sequence diagram can include more objects. However, this number shouldn’t increase too much, as the diagram would become complicated to understand. For instance, it might not fit on a single page.

An object can be active and inactive multiple times within the same diagram. In other words, it can execute a method, become inactive, execute another method, go inactive again, and so on. There’s also a special case when an object invokes a method on itself, i.e., when it calls a method using this. To illustrate this case, consider the following program.

class A {
  void g() { 

  void f() {

  main() {
    A a = new A();

The execution of this program is represented by the following diagram. Notice how the call to g() from f() originates a new rectangle from the rectangle representing f()’s activation.

The next diagram shows a more realistic scenario, illustrating the methods called when a client at an ATM makes a deposit into their account.

4.6 Activity Diagrams

Activity diagrams represent, at a high level, a business process flow. The main elements of these diagrams are actions, represented by rectangles. There are also control elements that define the execution order of the actions. The following figure shows an activity diagram that models the process followed after a user completes a purchase on an online store. For this, we are assuming that the products are already in the shopping cart.

To understand the semantics of an activity diagram, we should imagine that there is a token that moves through the nodes of the diagram. We will rely on this token to explain the semantics of each node of such diagrams.

Initial Node: This node creates a token to start the execution. Then, it passes the token to its output flow. By definition, the initial node does not have an input flow.

Actions: These nodes have a single input flow and a single output flow. For an action to be executed, a token needs to arrive at its input flow. After the action is completed, the token is passed on to the output flow.

Decisions: These nodes have a single input flow and two or more output flows. Each output flow has an associated boolean variable, called a guard. To make a decision, a token must be received on the input flow. When this happens, the token is passed to the output flow whose condition is true. Thus, the conditions should be mutually exclusive.

Merges: These nodes can have multiple input flows but a single output flow. When a token arrives at a input flow, it is immediately passed to the output flow. Merges are used to join the flows from decision nodes.

Forks: These nodes have a single input flow and one or more output flows. They act as token multipliers: when they receive a token at the input flow, they create and pass identical tokens to each output flow. As a result, multiple parallel processes start to execute.

Joins: These nodes have multiple input flows but a single output flow. They act as token sinks: they wait for tokens to arrive at all input flows. When this happens, they pass a single token to the output. Hence, joins are used to synchronize processes. In other words, to convert multiple execution flows into a single flow.

Final Node: This node can have multiple input flows but they have no output flows. When a token arrives at one of the input flows, the execution of the activity diagram is terminated.

In-Depth: There are at least three other alternatives for modeling processes:


Martin Fowler. UML Distilled: A Brief Guide to the Standard Object Modeling Language. Addison-Wesley, 2003.

Grady Booch, James Rumbaugh, Ivar Jacobson. The Unified Modeling Language User Guide. Addison-Wesley, 2005.

Craig Larman. Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and Iterative Development. Prentice-Hall, 2004.


1. Explain and discuss the three possible uses of UML: (a) as blueprint; (b) as sketches; (c) as programming language.

2. Describe scenarios where class diagrams can help with (a) reverse engineering tasks; (b) forward engineering tasks.

3. Model the following scenarios using class diagrams. The classes are written in a different font.

  1. A BankAccount has exactly one Customer. A Customer, in turn, may have several BankAccount. Navigation is possible in both directions.

  2. SavingsAccount and SalaryAccount are subclasses of BankAccount.

  3. BankAccount has a local variable of type Database.

  4. OrderLine refers to a single Order (without navigation). But an Order can have several OrderLine (with navigation).

  5. Student has attributes name, course, GPA (all private); and methods getCourse() and cancelEnrollment(), both public.

4. Draw a class diagram to represent the following scenario about scientific journals:

5. Draw a class diagram for the following class.

public class HelloWorldGUI {
  public static void main(String[] args) {
    JFrame frame = new JFrame("Hello world!");

6. Draw a class diagram for the following class.

class HelloWorldGUI extends JFrame {
   public HelloWorldGUI() {
     super("Hello world!");

   public static void main(String[] args) {
     HelloWorldSwing frame = new HelloWorldGUI();

6. Draw a sequence diagram for the following code. The diagram should start with the call a.m5().

A a = new A(); // global variables
B b = new B();
C c = new C();

class C { 
   void m1() { ... } 

class B { 
   void m2() { ... c.m1(); ... this.m3(); ... }
   void m3() { ... c.m1(); ... }
   void m4() { ... }

class A { 
   void m5() { ... b.m2(); ... b.m3(); ... b.m4(); ...  }

7. In activity diagrams, explain the difference between merges and joins.

8. What is the error in the following activity diagram? Change the diagram to fix this error and therefore to reflect the intention of the software designer.