The best way to learn Git is probably to first only do very basic things and not even look at some of the things you can do until you are familiar and confident about the basics. – Linus Torvalds
In this appendix, we introduce and discuss examples of using the Git system, currently the most widely used version control system. Inspired by the quote above from Linus Torvalds, the creator of Git, we will focus on the basic concepts and commands of this system. As suggested by the quote, it’s important to master these commands before delving into more advanced ones. If you are not familiar with the objectives and services provided by a version control system, we recommend first reading the Version Control section from Chapter 10.
To start using Git to manage the versions of a system, we must
execute one of the following commands: init
or
clone
. The init
command creates an empty
repository. The clone
command first calls init
to create an empty repository. Then, it copies into this repository all
the commits from a remote repository, passed as a parameter. The
following command is an example:
git clone https://github.com/USER-NAME/REPO-NAME
It clones a GitHub repository into the current directory. Thus, we
should use clone
when working on a project that is already
underway and has commits in a central server. In the example, this
server is on GitHub.
Commits are used to create snapshots (or photos) of a system’s files. Once these snapshots have been taken, they are stored in the version control system in a compact and efficient manner to minimize disk space usage. Subsequently, we can retrieve any of the snapshots. For instance, we may want to restore an old implementation of a specific file.
Developers are advised to make commits periodically, especially when they have introduced significant changes to the code. In distributed version control systems such as Git, commits are initially stored in the developer’s local repository. Thus, the cost of a commit is minimal, allowing developers to make multiple commits throughout a working day. However, it is not recommended to make large commits that involve substantial modifications in multiple files. Additionally, changes related to more than one maintenance task should not be included in the same commit. For instance, fixing two bugs in the same commit is not advisable. Instead, each bug should be addressed in a separate commit. This practice simplifies code review, especially in cases where a customer complains that a particular bug has not been resolved.
Commits also contain metadata, including date, time, author, and a
message describing the modification made by the commit. The next figure
shows a GitHub page that displays the main metadata of a commit from the
google/guava
repository. We can see that the commit refers
to a refactoring, which is clear from its title. Then, the refactoring
is explained in detail in the commit message. In the last line of the
figure, we can see the author’s name and the information that the commit
was made 13 days ago.
In the last line of the figure, we can also note that every commit has a unique identifier, in this case:
1c757483665f0ba8fed31a2af7e31643a4590256
This identifier has 20 bytes, normally represented in hexadecimal. It provides a checksum of the commit’s content, computed using an SHA-1 hash function.
Locally, the Git system has three distinct areas:
A working directory, where we save the files we intend to version. Sometimes, this area is also called a working tree.
The repository itself, which stores the commit history.
An intermediate area, called index or stage, which temporarily stores the files intended for versioning. Such files are referred to as tracked.
Among these areas, the developer only accesses the working directory,
which functions as a regular operating system directory. The other two
areas belong to Git and are managed solely by it. Like any directory,
the working directory can contain various files. However, only the ones
added to the index, by means of a git add
, are managed by
Git.
In addition to storing the list of versioned files, the index also
stores their content. Thus, before conducting a git commit
,
we must execute a git add
to save the file’s content to the
index. Having done that, we should use a git commit
to
store the version added to the index in the local repository. This
process is illustrated in the next figure.
Example: Suppose the following simple file, which is
sufficient to explain the add
and commit
commands.
// file1
x = 10;
After creating this file, the developer executed the following command:
git add file1
It adds file1
to the index (or stage). However,
immediately thereafter, the developer modified the file again:
// file1
x = 20; // new value for x
Having done that, the developer executed:
git commit -m "New value of x"
The -m
flag provides the message that describes the
commit. However, the point we want to stress here is this: since the
user did not execute a new add
after changing the value of
x
to 20, the commit will not save the most recent version
of the file. Instead, the version of file1
that will be
versioned is the one where x
equals 10 because it is the
version in the index.
To avoid this problem, it’s common to use a commit
like
this:
git commit -a -m "New value of x"
The -a
option indicates that before executing the
commit, we want to add to the index all tracked files that have been
modified since the last commit. Therefore, the -a
option
does not eliminate the need to use add
. We still need to
use this command at least once to tell Git that we want to make a
specific file trackable.
Just like there is an add
, there is also a command to
remove a file from a Git repository. An example is as follows:
git rm file1.txt
git commit -m "Removed file1.txt"
Besides removing from the local Git repository, the rm
command also deletes the file from the working directory.
The status
command is one of the most used Git
commands. Among other information, it shows the state of the working
directory and the index. For example, it can be used to show information
about:
Files in the working directory that have been changed but haven’t been added to the index yet.
Files in the working directory that are not tracked by Git,
meaning they have not been subject to an add
.
Files that are in the index, waiting for a
commit
.
The git diff
command highlights modifications made to
the files in the working directory that haven’t been moved to the index
yet. For each modified file, the command shows the lines that have been
added (+) and removed (-). git diff
is frequently used
before an add
/commit
to check the changes that
will be perpetuated in the version control system.
The git log
command shows information on the latest
commits, including date, author, time, and description.
The push
command copies the most recent commits from
the local repository to a remote repository. Hence, it is generally a
slower operation, as it involves network communication. A
push
should be used when a developer wants to make a given
modification visible to other developers. To update their local
repository, the other team’s members must use a pull
command. This command performs two main operations:
First, a pull
copies the most recent commits from
the remote repository to the developer’s local repository. This
operation is called fetch.
Then, the files in the working directory are updated. This operation is called merge.
The next figure illustrates the functioning of push
and
pull
commands.
Example: Assume that in the central repository of a project there is the following file:
void f() {
...
}
Imagine two developers, named Bob and Alice, performed a
pull
and, hence, copied this file into their local
repositories and their respective working directories. The syntax of
this command is:
git pull
On the same day, Bob implemented a second function g
in
this file:
void f() {
...
}
void g() { // by Bob
...
}
Next, Bob executed an add
, commit
, and
push
. The syntax of the push
is:
git push origin main
The origin
parameter is a default value used by Git to
indicate the remote repository, such as a GitHub repository. The
main
parameter refers to the main branch. We will study
more about branches soon.
After running the above push
command, the new version of
the file will be copied to the remote repository. A few days later,
Alice decided she needs to modify that same file. Since she’s been away
from the project for a while, it is recommended that she first executes
a pull
to update her local repository and working directory
with the changes that have occurred in that period, like the one made by
Bob. Thus, after the pull
, the file in question will be
updated on Alice’s machine to include the function g
implemented by Bob.
Merge conflicts occur when two or more developers modify the same section of the code at the same time. To better understand this situation, let’s use an example.
Example: Suppose Bob implemented the following program:
main() {
print("Helo, world!");
}
Upon completing the implementation, Bob executed an add
,
followed by commit
, and push
.
Next, Alice performed a pull
and retrieved the file
implemented by Bob. Then, she decided to translate the program’s message
into Portuguese.
main() {
print("Olá, mundo!");
}
While Alice was making the translation, Bob noticed that he wrote
Hello
incorrectly, with only one l.
However, Alice was
faster and executed the trio of commands add
,
commit
, and push
.
Bob, after correcting the typo, executed an add
,
followed by a commit
. Lastly, he performed a
push
, but this command failed with the following
message:
The message is clear: Bob can’t execute a push
as the
remote repository contains a new version of the file, in this case,
pushed by Alice. Thus, Bob needs first to perform a pull
.
However, when he does this, he receives a new error message:
This new message is also clear: there is a merge conflict in
file2
. After opening this file, Bob realizes that Git
modified it to highlight the conflict-generating lines:
main() {
<<<<<<< HEAD
print("Hello, world!");
=======
print("Olá, mundo!");
>>>>>>> f25bce8fea85a625b891c890a8eca003b723f21b
}
These modifications should be understood as follows:
Between <<<<<<< HEAD
and
=======
we have the code modified by Bob, who couldn’t
execute a push
and had to execute a pull
.
HEAD
indicates that this code was modified in Bob’s most
recent commit.
Between =======
and
>>>>>>> f25bce8 ...
we have the code
modified by Alice, who successfully executed the push
.
f225bce8...
is the ID of the commit in which Alice modified
this code.
It’s then up to Bob to resolve the conflict, which is a manual task. He must choose which section of the code will prevail—his code or Alice’s—and edit the file according to his choice, thus removing the delimiters inserted by Git.
Let’s assume that Bob decides Alice’s code is correct, since the system is now using messages in Portuguese. Therefore, he should edit the file so that it looks like this:
main() {
print("Olá, mundo!");
}
Note that Bob removed the delimiters inserted by Git
(<<<<<<< HEAD
, =======
,
and >>>>>>> f25bce8...
). And also the
print
command with the message in English. After leaving
the code in the correct form, Bob should execute the commands
add
, commit
, and push
again,
which will now be successful.
In this example, we showed a simple conflict, confined to a single
line of a single file. However, a pull
can give rise to
more complex conflicts. For instance, the same file may include several
conflicts. We can also have conflicts across more than one file.
Git organizes the workspace into virtual folders, named
branches. So far, we have not discussed branches
because every repository has a default branch, named
main, created by the init
command. If we
do not concern ourselves with branches, all development will occur on
this branch. However, in some cases, it might be beneficial to create
other branches to better organize the development. Thus, to explain the
concept of branches, let’s use another example.
Example: Suppose Bob is responsible for maintaining
a certain feature of a system. For simplicity, let’s assume this feature
is implemented in a single function f
. Bob had the idea to
completely change the implementation of f
to use a more
efficient algorithm and data structure. For this, Bob will need a few
weeks. However, despite being optimistic, Bob is not sure if the new
implementation will provide the gains he anticipates. Finally, but not
to be overlooked, during the new implementation, Bob might need to
access the original code of f
, for example, to fix bugs
reported by users.
This is an interesting scenario for Bob to create a branch to
implement and test, in isolation, this new version of f
. To
do this, he should use:
git branch f-new
This command creates a new branch, named f-new
,
presuming that this branch does not already exist.
To switch from the current branch to a new branch, we should use
git checkout [branch-name]
. To find the name of the current
branch, we simply use git branch
. In reality, this command
lists all the branches and shows which one is current.
As we mentioned, we can conceptualize branches as virtual
subdirectories
within the working directory. The key distinction
lies in the fact that branches are managed by Git, not by the operating
system, making them virtual in nature. Expanding on this analogy, the
git branch [name]
command is akin to the
mkdir [name]
command, but Git not only creates the branch
but also copies all the files from the parent branch to it. In contrast,
directories created by the operating system are initially empty. The
git checkout [name]
command is similar to a
cd [name]
command, while git status
combines
aspects of both ls
and pwd
commands.
Usually, we also have the option to customize the operating system
prompt by including information about the current directory. A similar
customization is possible with Git branches. Consequently, the prompt
exhibited by Git can take, for example, the following form:
~/projects/systemXYZ/main
.
However, there’s an important difference between branches and
directories. A developer can only switch the current branch from A to B
if they have saved their modifications to A, meaning they have first
executed add
and commit
. If these commands are
omitted, git checkout B
will fail, resulting in the
following error message:
Returning to the example, after Bob has created his branch, he must
proceed in the following way. When he plans to work on the new
implementation of f
, he should first switch the current
branch to f-new
. On the other hand, when he needs to modify
the original code of f
—the production code—he should make
sure that the current branch is main
. Regardless of which
branch he is on, Bob must use add
and commit
to save the state of his work.
Bob will continue with this workflow, alternating between the
f-new
and main
branches until the new
implementation of f
is completed. When this happens, Bob
should merge the new code into the original one. However, with the use
of branches, he no longer needs to perform this operation manually. Git
provides a command called merge that handles this
integration for him. The syntax is as follows:
git merge f-new
This command must be invoked on the branch that will receive the
modifications from f-new
. In our case, on the
main
branch.
As the reader may be thinking, a merge can generate conflicts, also
known as integration conflicts. In the specific case of
merging branches, these conflicts will occur when both the branch
receiving the modifications (main
, in our example) and the
branch being integrated (f-new
, in our example) have
modified the same lines of the code. As discussed in Section A.6, Git
detects and delimits the conflict areas, and it is up to the developer
who called the merge to resolve it, i.e., choose the code that should
prevail.
Finally, after performing the merge, Bob can remove the
f-new
branch if it’s no longer important to maintain the
commit history for the new implementation. To delete f-new
,
he must execute the following command on the main
branch:
git branch -d f-new
Commits may have zero, one, or more parents (or predecessors). As the next figure illustrates, the first commit of a repository does not have a parent. A merge commit however, has two or more parents, representing the branches that were merged. For example, commit 10 in the figure has two parents. The other commits in this figure have exactly one parent node.
A branch is nothing more than an internal Git variable containing
the identifier of the last commit made on this branch. There is also a
variable called HEAD
, which points to the current branch’s
variable. That is, HEAD
contains the name of the variable
holding the identifier of the current branch’s last commit. Here is an
example:
In this example, there are two branches, represented by the
MAIN
and ISSUE-45
variables. Each one points
to the last commit of their respective branches. The HEAD
variable points to the MAIN
variable. This means that the
current branch is MAIN
. If a commit is made, the graph
changes to:
The new commit has identifier 7. It was made on MAIN
,
since HEAD
was pointing to this branch’s variable. The
parent of the new commit is the old HEAD
, i.e., commit 3.
The MAIN
variable moved forward to point to the new commit.
This means that if the branch isn’t changed, the parent of the next
commit will be commit 7.
However, if we switch to the ISSUE-45
branch, the graph
would be the one shown in the next figure. The only change is that the
HEAD
variable now points to ISSUE-45
. This is
enough to direct the next commit to this branch, i.e., for this commit
to have commit 6 as its parent.
Up until now, we’ve been working with local branches, i.e., the
branches we’ve discussed exist only in the local repository. However, it
is also possible to push
a local branch to a remote
repository. To illustrate this feature, let’s use an example similar to
the one in the previous section.
Example: Suppose that Bob created a branch called
g-new
to implement a new functionality. He made some
commits on this branch, and now he would like to share it with Alice so
that she can collaborate on this new implementation. To achieve this,
Bob should use the following push
:
git push -u origin g-new
This command executes a push
of the current branch
(g-new
) to the remote repository, referred to as
origin
by Git. The remote repository can be, for instance,
a GitHub repository. The -u
parameter indicates that, in
the future, we will sync the two repositories using a pull
(the letter in the parameter refers to upstream). This syntax
applies only for the first push
of a remote branch. In the
following commands, we can omit -u
, i.e., just use
git push origin g-new
.
In the remote repository, a g-new
branch will be
created. To work on this branch, Alice must first create it on her local
machine and then associate it with the remote branch. For this, she
should execute the following commands on the main
branch:
git pull
git checkout -t origin/g-new
The first command makes the remote branch visible on her local
machine. The second command creates a local g-new
branch,
which Alice will use to track changes on the remote branch. This is
indicated by the -t
parameter, short for tracking.
Next, Alice can make commits to this branch. Finally, when she is ready
to publish her changes, she should execute a push
, with the
usual syntax, i.e., without the -u
parameter.
After that, Bob can execute a pull
and conclude, for
example, that the implementation of the new functionality is finished
and can be merged into the main
branch. He can also delete
the local and remote branches using:
git branch -d g-new
git push origin --delete g-new
Alice can also delete her local branch by using:
git branch -d g-new
Pull requests are a mechanism that allows a branch to be reviewed and
discussed before it is integrated into the main
branch.
When using pull requests, a developer first implements some features in
a separate branch. Once this implementation is finished, they do not
immediately integrate the new code into the main
branch.
Instead, they open a request for their branch to be reviewed and
approved by a second developer. This request for review and integration
is called a pull request. This mechanism is common on GitHub, but it has
equivalents in other version control systems.
Nowadays, the review and integration process takes place via a web
interface provided, for instance, by GitHub. However, if this interface
did not exist, the reviewer would have to start their work by performing
a pull
of the branch to their local machine. This is the
origin of the name: a pull request is a request for another developer to
review and integrate a certain branch. To fulfill this request, when not
using a web interface, this reviewer should begin by performing a
pull
of the branch.
Next, we detail the process of submitting and reviewing pull requests using an example.
Example: Suppose that Bob and Alice are members of
an organization that maintains a repository called
awesome-git
, with a list of interesting links about Git.
The links are stored in the README.md file of this repository. Any
member of the organization can suggest the addition of links to this
page. However, they cannot do a push
directly to the
main
branch. Instead, the suggestion needs to be reviewed
and approved by another team member.
Bob then decided to suggest adding this appendix to this list. To do
so, he first cloned the repository and created a branch, named
se-book-appendix
, using the following commands:
git clone https://github.com/aserg-ufmg/awesome-git.git
git checkout se-book-appendix
Then, Bob edited the README.md file, adding the URL of this appendix.
Finally, he carried out an add
, commit
, and
pushed the branch to GitHub:
git add README.md
git commit -m "SE: A Modern Approach - Appendix A - Git"
git push -u origin se-book-appendix
Actually, these steps are not new compared to what we presented in
the previous section. However, the differences start now. First, Bob
should go to the GitHub page and select the
se-book-appendix
branch. Once this is done, GitHub displays
a button to create pull requests. Bob should click on this button and
describe his pull request, as shown in the next figure.
A pull request is a request for another developer to review and, if
appropriate, merge a branch you have created. Consequently, pull
requests are a way for an organization to adopt code
reviews. That is, developers do not directly integrate their
code into the remote repository’s main
branch. Instead,
they request other developers to first review this code and then merge
it.
On GitHub’s pull request creation page, Bob can invite Alice to
review his code. She will then be notified that there is a pull request
waiting for review. Also via GitHub’s interface, Alice can review the
commits from Bob’s pull request. For example, she can inspect a diff
between the new and old code. If necessary, Alice can exchange messages
with Bob to clarify doubts about the code. She can also request changes
in this code. In this case, Bob should provide the changes and carry out
a new add
, commit
, and push
. The
new commits will be automatically appended to the pull request, so Alice
can check if her request has been met. Once the modification is
approved, Alice should integrate the code into the main
branch, by clicking a button on the pull request review page.
Squash is a command that allows merging several commits into a single commit. It is recommended, for example, before submitting pull requests.
Example: In the previous example, suppose the pull
request created by Bob has five commits. Specifically, he is suggesting
the addition of five new links to the awesome-git
repository, which he gathered over some weeks. After discovering each
link, Bob performed a commit on his machine. In fact, he plans to create
the pull request only after accumulating five commits.
However, to facilitate the review of his pull request by Alice, Bob intends to merge the five commits into a single one. Thus, instead of analyzing five commits, Alice will need to review only one. The submitted modification is exactly the same, i.e., it consists of adding five links to the page. However, instead of the changes being distributed across five commit, they are consolidated into a single one.
To perform a squash, Bob should call:
git rebase -i HEAD~5
The number 5 means that he intends to merge the last five commits in the current branch. After that, Git opens a text editor with a list containing the ID and description of each commit, as shown below:
pick 16b5fcc Including link 1
pick c964dea Including link 2
pick 06cf8ee Including link 3
pick 396b4a3 Including link 4
pick 9be7fdb Including link 5
Bob should use the editor itself to replace the word pick
with
squash
, except for the one in the first line. The file will then
look like this:
pick 16b5fcc Including link 1
squash c964dea Including link 2
squash 06cf8ee Including link 3
squash 396b4a3 Including link 4
squash 9be7fdb Including link 5
Then, Bob should save this file. Automatically, Git opens a new editor for him to inform the message of the new commit—that is, the commit merging the five listed commits. After providing this message, Bob should save the file, and then the squash is completed.
Fork is the mechanism provided by GitHub to clone remote
repositories, i.e., repositories stored on GitHub. A fork is performed
via GitHub’s interface. On the page of any repository, there is a button
to perform this operation. If we fork the torvalds/linux
repository, a copy of this repository will be created in our GitHub
account, named, for example, mtov/linux
.
As we always do, let’s use an example to explain this operation.
Example: Consider the
aserg-ufmg/awesome-git
repository, used in the example
about pull requests. Also, consider a third developer, named Carol.
However, since Carol is not a member of the ASERG/UFMG organization, she
doesn’t have permission to perform a push
in this
repository, as Bob did in the previous example. Despite this, Carol
believes that an important and interesting link is missing from the
current list, and she would like to suggest its inclusion. But remember:
Carol cannot follow the same steps used by Bob in the previous example,
as she doesn’t have permission to push
to the repository in
question.
To solve this problem, Carol should start by forking the repository.
To do so, she just needs to click on the fork button, that exists on the
page of any GitHub repository. After that, she will have a new
repository in her GitHub account, whose name is
carol/awesome-git
. Then, she can clone this repository to
her local machine, create a branch, add the link she wants to the list,
and perform an add
, commit
, and
push
. This last operation will be carried out in the forked
repository. Finally, Carol should go to the page of her GitHub fork and
create a pull request. Since the repository is a fork, she has an extra
option: to direct the pull request to the original repository. Thus, the
developers of the original repository, like Bob and Alice, will be
responsible for reviewing and, possibly, accepting the pull request.
Therefore, a fork is a mechanism that, when combined with pull requests, allows an open-source project to receive contributions from other developers. To explain a bit better, an open-source project can receive contributions—more specifically, commits—not only from its team of developers (Bob and Alice, in our example) but also from any other developer with a GitHub account (like Carol).
Scott Chacon, Ben Straub. Pro Git. 2nd edition, Apress, 2014.
Rachel M. Carmena. How to teach Git. Blog post (link).
Try to reproduce each of the examples presented in this appendix. In the examples involving remote repositories, we suggest to use a GitHub repository. In examples involving two users (Alice and Bob, for example), we suggest to create two local directories and use them to reproduce each user’s commands.
This book was formatted using the Pandoc system to convert Markdown to LaTeX and subsequently generate a PDF file. The font is Computer Modern, 11pt. Additionally, from the Markdown files, the EPUB and HTML versions were generated.