Shivani Sharma — September 6, 2021
Advanced Model Deployment Programming Technique

This article was published as a part of the Data Science Blogathon

Introduction

Building, running, browsing, moving containers and images using the Docker CLI is as easy as shelling peas, but have you ever wondered how the internals that power the Docker interface actually work? This simple interface hides a lot of advanced technologies, and especially for the start of the new stream of DevOps in this article, we will look at one of them – the unified file system used in all layers of containers and images. For venerable connoisseurs of containerization and orchestration, this material is unlikely to open something new, but it will be useful for those who are taking their first steps in DevOps.

What is a Unified Filesystem?

A cascade merged mount is a type of filesystem that creates the illusion of combining the contents of multiple directories into one without changing the original (physical) sources. This approach can be useful if you have related sets of files stored in different locations or on different media, but you want to display them as a single and cumulative whole. For example, a set of user/root directories located on remote NFS servers can be consolidated into a single directory, or a split ISO image can be merged into one whole image.

However, the technology of merged mount (or unified file system), in fact, cannot be considered a separate type of file system. Rather, it is a special concept with many implementations. Some implementations are faster, some are slower, some are easier to use, some are more complex – simply put, different implementations have different goals and different levels of maturity. So before diving into the details, let’s take a look at some of the more popular unified file system implementations:

  • Let’s start with the original merged filesystem, namely UnionFS. At the moment, support for the UnionFS file system is discontinued, the last code change was recorded in August 2014. For more information on this file system, see unionfs.filesystems.org.

  • aufs is an alternate version of the original UnionFS file system with many new features added. This file system cannot be used with the vanilla Linux kernel. Aufs was used as the default filesystem for Docker on Ubuntu / Debian, however, over time it was replaced by OverlayFS (for Linux kernels> 4.0). Compared to other federated filesystems, this system has a number of advantages.

  • The next system, OverlayFS, has been included in the Linux Kernel since version 3.18 (October 26, 2014). This file system is used by default by the overlay2 Docker driver (you can check this by running the docker system info | grep Storage command). This file system generally provides better performance than aufs and has a number of interesting functional features, for example, the function of splitting the page cache.

  • ZFS is a unified file system developed by Sun Microsystems (now called Oracle). This system provides a number of useful features such as a hierarchical checksum function, a snapshot function, a backup/replication function, or an archive and deduplication (elimination of redundancy) function of internal data. However, since this file system was authored by Oracle, it was released under a general development and distribution license (CDDL) that does not apply to open source software, so the file system cannot be shipped as part of the Linux kernel. However, you can use the ZFS on Linux (ZoL) project, which the Docker documentation describes as workable and well-developed … but alas, unsuitable for industrial use. If you would like to work with this filesystem, you can find it here.

  • Btrfs is another file system variant and is a collaborative project of many companies, including SUSE, WD, and Facebook. This file system is released under the GPL license and is part of the Linux kernel. Btrfs is the default file system for Fedora 33. If you are not intimidated by the difficulties of migrating to a dedicated memory device driver for Docker, Btrfs with its functionality and performance might be the best option.

For a more in-depth look at the characteristics of the drivers used in Docker, the Docker Docs provides a comparison table for different drivers. If you are at a loss with the choice of the file system (I have no doubt that there are some programmers who know all the intricacies of file systems, but this article is not intended for them), take the default file system overlay2 as a basis – that is what I will use in the examples in the rest parts of this article.

Why exactly Unified Filesystem?

Many images used to run containers are quite large, for example, ubuntu is 72 MB, and Nginx is 133 MB. It would be quite ruinous to allocate so much space whenever you need to create a container from these images. When using the merged filesystem, Docker creates a thin layer on top of the image, and the rest of the image can be distributed among all containers. We also get the added benefit of faster startup times by eliminating the need to copy image files and data. If containers ever need to make changes to any read-only file, they use the copy-on-write strategy (we’ll discuss this a bit later), which allows the content to be copied to the top writable layer where such content is. can be safely changed.

How does Unified Filesystem work?

Now you have the right to ask me an important question: how does it all work in practice? From the above, one might get the impression that the merged filesystem is working with some kind of black magic, but in fact, it is not. Now I will try to explain how this works in the general (non-containerized) case. Suppose we need to merge two directories (top and bottom) at one mount point and so that such directories are presented in a uniform way:

.
├── upper
│   ├── code.py  # Content: `print("Hello Overlay!")`
│   └── script.py
└── lower
    ├── code.py  # Content: `print("some coding")`
    └── config.yaml

In pooled mount terminology, such directories are called branches. Each of these branches is assigned a different priority. The priority is used to decide which file will be displayed in the merged view if there are files with the same name in multiple source branches. If we analyze the files and directories presented above, it becomes clear that such a conflict can arise if we try to use them in overlay mode (code.py file). Let’s see what we get:

~ $ mount -t overlay 
    -o lowerdir=./lower,
       upperdir=./upper,
       workdir=./workdir 
    overlay /mnt/merged
~ $ ls /mnt/merged
code.py  config.yaml  script.py
~ $ cat /mnt/merged/code.py
print("Hello Overlay!")

In the above example, we used the mount command with type overlay to merge the bottom directory (read-only; lower priority) and the top directory (read-write; higher priority) into a merged view in the / mnt / merged directory. We have also enabled the option workdir =. / Workdir. This directory serves as a place to prepare a merged view of the lowerdir and upperdir before moving them to the / mnt / merged directory.

If you look at the output of the cat command, you will notice that the top directory files have taken precedence in the combined view.

We now know how to merge two directories and what happens when a conflict occurs. But what happens if you try to modify certain files in the merged view? This is where the copy-on-write (CoW) feature comes into play. What exactly does this function do? The CoW is an optimization technique where if two callers access the same resource, you can give them a pointer to the same resource without copying it. Copying is necessary only when one of the calling programs tries to write its own “copy” – hence the word “copy” appeared in the name of the method, that is, copying is carried out during the (first attempt) recording.

In the case of a pooled mount, this means that if we try to modify a shared file (or read-only file), it is first copied to the upper writable branch (upperdir), which has a higher priority than the lower branches (lowerdir). read-only. Once a file is on a writable branch, it can be safely modified and its new contents displayed in the merged view, as the top layer takes precedence.

The last operation we might want to do is delete files. To “delete” a file, a whiteout file is created on the writable branch to clean up the “deleted” file. In fact, the file will not be physically deleted. Rather, it will be hidden in the combined view.

We’ve talked a lot about the principles of pooled mounts, but how do all of these principles work on the Docker platform and its containers? Let’s take a look at Docker’s layered architecture. A container sandbox is made up of multiple image branches, or layers as we call them.

Terminological differences aside, we are actually talking about the same thing – the image layers retrieved from the registry represent the lower dir, and if the container is launched, the upper dir is attached on top of the image layers, providing a work area available for writing to the container. Sounds pretty simple, doesn’t it? Let’s check out how everything works!

Checking

To show how Docker uses the OverlayFS file system, let’s try to simulate the process of mounting Docker containers and image layers. Before we get started, we need to first clear the workspace and get an image to work with:

~ $ docker image prune -af
...
Total reclaimed space: ...MB
~ $ docker pull nginx
Using default tag: latest
latest: Pulling from library/nginx
a076a628af6f: Pull complete
0732ab25fa22: Pull complete
d7f36f6fe38f: Pull complete
f72584a26f32: Pull complete
7125e4df9063: Pull complete
Digest: sha256:10b8cc432d56da8b61b070f4c7d2543a9ed17c2b23010b43af434fd40e2ca4aa
Status: Downloaded newer image for nginx:latest
docker.io/library/nginx:latest

So, we have an image (nginx) that we can work with, then we need to check its layers. You can check the image layers either by running the image test in Docker and examining the GraphDriver fields or by going to the / var/lib/docker / overlay2 directory, which contains all the image layers. Let’s perform both of these operations and see what happens:

~ $ cd /var/lib/docker/overlay2
~ $ ls -l
total 0
~ $ tree 3d963d191b2101b3406348217f4257d7374aa4b4a73b4a6dd4ab0f365d38dfbd/
3d963d191b2101b3406348217f4257d7374aa4b4a73b4a6dd4ab0f365d38dfbd/
├── diff
│   └── docker-entrypoint.d
│       └── 20-envsubst-on-templates.sh
├── link
├── lower
└── work
~ $ docker inspect nginx | jq .[0].GraphDriver.Data
{
  "LowerDir": "/var/lib/docker/overlay2/fb18be50518ec9b37faf229f254bbb454f7663f1c9c45af9f272829172015505/diff:
    /var/lib/docker/overlay2/d487622ece100972afba76fda13f56029dec5ec26ffcf552191f6241e05cab7e/diff:
    /var/lib/docker/overlay2/685374e39a6aac7a346963bb51e2fc7b9f5e2bdbb5eac6c76ccdaef807abc25e/diff:
    /var/lib/docker/overlay2/410c05aaa30dd006fc47d8c23ba0d173c6d305e4d93fdc3d9abcad9e78862b46/diff",
  "MergedDir": "/var/lib/docker/overlay2/3d963d191b2101b3406348217f4257d7374aa4b4a73b4a6dd4ab0f365d38dfbd/merged",
  "UpperDir": "/var/lib/docker/overlay2/3d963d191b2101b3406348217f4257d7374aa4b4a73b4a6dd4ab0f365d38dfbd/diff",
  "WorkDir": "/var/lib/docker/overlay2/3d963d191b2101b3406348217f4257d7374aa4b4a73b4a6dd4ab0f365d38dfbd/work"
}

If you look closely at the results obtained, you will notice that they are very similar to those that we have already observed after using the mount command, don’t you think?

  • LowerDir: This is the directory where the read-only image layers are separated by colons.

  • MergedDir: A merged view of all layers in the image and container.

  • UpperDir: Read/write layer on which changes are written.

  • WorkDir: The working directory used by Linux OverlayFS to prepare the merged view.

Let’s take one more step – launch the container and examine its layers:

~ $ docker inspect container | jq .[0].GraphDriver.Data
{
  "LowerDir": "/var/lib/docker/overlay2/59bcd145c580de3bb3b2b9c6102e4d52d0ddd1ed598e742b3a0e13e261ee6eb4-init/diff:
    /var/lib/docker/overlay2/3d963d191b2101b3406348217f4257d7374aa4b4a73b4a6dd4ab0f365d38dfbd/diff:
    /var/lib/docker/overlay2/fb18be50518ec9b37faf229f254bbb454f7663f1c9c45af9f272829172015505/diff:
    /var/lib/docker/overlay2/d487622ece100972afba76fda13f56029dec5ec26ffcf552191f6241e05cab7e/diff:
    /var/lib/docker/overlay2/685374e39a6aac7a346963bb51e2fc7b9f5e2bdbb5eac6c76ccdaef807abc25e/diff:
    /var/lib/docker/overlay2/410c05aaa30dd006fc47d8c23ba0d173c6d305e4d93fdc3d9abcad9e78862b46/diff",
  "MergedDir": "/var/lib/docker/overlay2/59bcd145c580de3bb3b2b9c6102e4d52d0ddd1ed598e742b3a0e13e261ee6eb4/merged",
  "UpperDir": "/var/lib/docker/overlay2/59bcd145c580de3bb3b2b9c6102e4d52d0ddd1ed598e742b3a0e13e261ee6eb4/diff",
  "WorkDir": "/var/lib/docker/overlay2/59bcd145c580de3bb3b2b9c6102e4d52d0ddd1ed598e742b3a0e13e261ee6eb4/work"
}
~ $ tree -l 3 
/var/lib/docker/overlay2/59bcd145c580de3bb3b2b9c6102e4d52d0ddd1ed598e742b3a0e13e261ee6eb4/diff
├── etc
│   └── nginx
│       └── conf.d
│           └── default.conf
├── run
│   └── nginx.pid
└── var
    └── cache
        └── nginx
            ├── client_temp
            ├── fastcgi_temp
            ├── proxy_temp
            ├── scgi_temp
            └── uwsgi_temp

From the above output, it follows that the same directories that were previously listed in the docker inspect nginx output as MergedDir, UpperDir, and WorkDir (with id 3d963d191b2101b3406348217f4257d7374aa4b4a73b4a6dd4ab0f365d38dfbd are now part of the Lower container). In our case, the LowerDir is composed of all layers of the nginx image, placed on top of each other. On top of these is a writable layer in UpperDir containing the / etc, / run, and / var directories. Also, since we mentioned MergedDir above, you can see the entire file system available to the container, including the entire contents of the UpperDir and LowerDir directories.

Docker internals | Unified Filesystem

Image 1

Finally, Docker’s behaviour emulation, these same directories can be used for manual creation of our own merged view:

~ $ mount -t overlay -o 
lowerdir=/var/lib/docker/overlay2/59bcd145c580de3bb3b2b9c6102e4d52d0ddd1ed598e742b3a0e13e261ee6eb4-init/diff:
    /var/lib/docker/overlay2/3d963d191b2101b3406348217f4257d7374aa4b4a73b4a6dd4ab0f365d38dfbd/diff:
    /var/lib/docker/overlay2/fb18be50518ec9b37faf229f254bbb454f7663f1c9c45af9f272829172015505/diff:
    /var/lib/docker/overlay2/d487622ece100972afba76fda13f56029dec5ec26ffcf552191f6241e05cab7e/diff:
    /var/lib/docker/overlay2/685374e39a6aac7a346963bb51e2fc7b9f5e2bdbb5eac6c76ccdaef807abc25e/diff:
    /var/lib/docker/overlay2/410c05aaa30dd006fc47d8c23ba0d173c6d305e4d93fdc3d9abcad9e78862b46/diff,
upperdir=/var/lib/docker/overlay2/59bcd145c580de3bb3b2b9c6102e4d52d0ddd1ed598e742b3a0e13e261ee6eb4/diff,
workdir=/var/lib/docker/overlay2/59bcd145c580de3bb3b2b9c6102e4d52d0ddd1ed598e742b3a0e13e261ee6eb4/work 
overlay /mnt/merged
~ $ ls /mnt/merged
proc  run   srv  tmp  var
 root  sbin  sys  usr
~ $ umount overlay

In our case, we just took the values ​​from the previous code snippet and passed them as appropriate arguments to the mount command. The only difference is that for the merged view, instead of /var/lib/docker/overlay2/…/merged, we used / mnt / merged.

This is what Docker’s OverlayFS comes down to – a single mount command can be used on multiple layers stacked on top of each other.

Conclusion

Docker’s interface is just like a black container with a lot of technologies hidden inside. These technologies – albeit incomprehensible – are quite interesting and useful. I do not mean to say that to use Docker effectively, you need to thoroughly know all their subtleties, but, in my opinion, if you spend a little time and understand how they work, it will only benefit you. A clear understanding of how the tool works make it easier to make the right decisions – in our case, we are talking about increasing productivity and possible aspects related to safety. In addition, you will be able to familiarize yourself with some advanced technologies, and who knows in what areas of knowledge they may be useful to you in the future!

In this article, we’ve only covered part of the Docker architecture – the file system. There are other parts worth checking out more closely, such as groups or Linux namespaces. If you master them, you can already think about the transition to the in-demand DevOps.

References

Image 1 – https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.slideshare.net%2FRohitJnagal%2Fdocker-internals&psig=AOvVaw3adCOtly17Sw0CBbHwgypy&ust=1630810060379000&source=images&cd=vfe&ved=0CAsQjRxqFwoTCOCiwIan5PICFQAAAAAdAAAAABAU

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Aniruddha Bhandari
  • Abhishek Sharma
  • Aarshay Jain

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *