What is the difference between a process, a container, and a VM?
Many people ask “What is the difference between a VM and a container?”, but in my opinion a more interesting question is …
“What is the difference between a process and a container?”
When containers started gaining popularity back in 2013, it was common to hear “Containers are like mini VMs”. This statement made sense because people where using containers instead of VMs, but in my opinion a more technically appropriate statement would be to say that a container is a process.
This post will describe what a process is, what a container is, and also what a VM is. Then go on to compare the three.
TL;DR
In reality, a container is like a VM and also not like a VM. It just depends on what you are evaluating.
Here we evaluate two different things to display this:
- What problem does each technology solve; how does the end user/end system interact with it. This addresses how “containers are like a VM”.
- How the implementation of the technology differs. Mainly looking deeper into how each technology is isolated from the operating system. This addresses how “containers are not like a VM”.
Overview
The following sections of this post include:
- What is a process, why do we need them.
- What is a container, what are they used for.
- What is a virtual machine, what are their use cases.
- Compare a process to a container to a virtual machine.
What is a process?
A process represents a running program; it is an instance of an executing program. A process consists of memory and a set of data structures. The kernel uses these data structures to store important information about the state of the program.
What problem is a process solving? Why do we need them?
The CPU can only execute one program at a time, therefore it must share the CPU with many programs and task switch between them. The CPU needs to remember where it left off in the execution of the program (among other things). The process is the abstraction that stores that state of the running program.
Isolation for a process
By default a process has pretty minimal isolation from the operating system resources. For example, you can easily get an error if you try to run multiple services on the same port. There are two main things that are isolated by default for processes:
- A process gets its own memory space.
- A process has restricted privileges. A process gets the same privileges as the user that created the process.
What is a container?
There are a bunch of definitions of what a container is.
Nigel Poulton’s definition is an “Isolated area of an OS with resource limits usage applied.”
The Wikipedia’s definition says “containers is a generic term for an operating system level virtualization. … There are a number of such implementations including Docker, lxc, and rkt.”
In the Unix/Linux System Admin book they describe a container to be an isolated group of processes that are restricted to a private root filesystem and process namespace.
My personal definition of a container is a group of processes with some cool kernel features sprinkled on top that allow the processes to pretend that they’re running on their own separate machine. While the host machine knows that the container is actually a process, the container thinks that it is a separate machine. These awesome kernel features that make this possible are:
- namespaces = Namespaces are the feature that make the container look and feel like it is an entirely separate machine.
- cgroups = A way to group processes together in the kernel and limit resources for that grouping. These were developed at Google in 2006 and were first called “process containers”.
- capabilities = A list of the superuser privileges that can be enabled or disabled for a process.
My favorite kernel features are the namespaces. There are 7 different linux namespaces, each for a different resource. The Linux man page has a great description for a namespace:
A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.
Namespaces are my favorite because they allow the container to have the look and feel of being their own separate machine. The 7 different types of namespaces relate to 7 different resources that get their own isolated instance in a container:
- cgroups — isolates the root directory
- IPC — isolates interprocess communication
- Network — isolates the network stack
- Mount — isolates mount points
- PID — isolates process IDs
- User — isolates User and Group IDs
- UTS — isolates hostnames and domain names
If you want to get a deeper explanation of this content, I highly recommend watching the “What is a Container?” section on Wes Higbee’s course Containers and Images: The Big Picture.
What problem is a container solving?
Containers allow many applications to run on one server, but is a pseudo-isolated environment. The container is pretending to be its own operating system. It can run a group of processes in, what it thinks is, an isolated environment. Since the container runs on the same OS as the host machine, the container has less resource overhead than say a VM.
Container Isolation
As a recap, to create a container, cgroups are used to group together processes into namespaces. The cgroups limits what resources (i.e CPU, memory) are available to the group. While namespaces create isolated instances of seven different resources (i.e. network stack, hostname, mounts, etc) giving the container the impression its a separate operating system.
What is a virtual machine (VM)?
A “virtual machine” was originally defined by Popek and Goldberg as “an efficient, isolated duplicate of a real computer machine.” VMs are a type of server virtualization. When talking about VMs there are a few important parts: 1) the hypervisor and 2) the actual VM (aka Guest OS). The hypervisor is the software that runs the VMs. It provides a layer of abstraction between the hardware and the VM allowing for more portability and better use of the host’s hardware resources.
What problem is a VM solving?
VMs make it possible to run many different types of operating system instances on a single machine. Additionally, VMs make it possible to run multiple applications on one server in a safe and secure manner making more efficient use of the computer’s physical resources. Virtualization makes it possible to convert the physical hardware into a shareable form. Back in the day before VMs, businesses typically ran one application per server. This meant there would often be tons of idle CPU on these server, over 90% idle sometimes! The VM technology made it possible to run many VMs and therefor many applications and reduce unused system resources.
VM Isolation
A VM has full isolation from the host’s operating system, it only shares the hardware. This level of isolation is much more isolated than processes and containers, since both of those rely on the host OS. Keep in mind that this full isolation comes with a tradeoff, it uses more resources to do so. So it is more isolated, more secure, but uses more of the host’s resources to run it’s own OS.
If you want a great explanation of this, I highly recommend the “What are Containers?” section on Nigel Poulton’s course Docker Containers — The Big Picture.
Process vs Container vs VM
If we are evaluating the technology implementation, it seems more appropriate to say a container is more like a process than a VM.
However if we are talking about the use case and how an end user interacts with the product, a container is more like a VM solving the same problem of providing an isolated environments to run many applications on one server.
Looking through the lens of how the technology is implemented, the level of isolation from the host system is the main aspect to evaluate:
- Processes have little default isolation at the operating system (OS) level, mainly they only have isolated memory space and user privileges.
- A container is a process (or a groups of processes), but with more isolation from the OS than your run-of-the-mill process. BUT with less isolation than a VM, which comes with the tradeoff of less security.
- Virtual Machines have full isolation at the OS level, meaning they create a complete new operating system on top of the host’s hardware. The full isolation comes at the tradeoff of more resource usage to run a VM.
Looking through the lens of what problem the technology is solving, how the end user/end system interacts with the technology is good to evaluate:
- Process: CPU needs a construct to store state about running programs, that is what a process is.
- Containers: Create isolated environments to run applications.
- VMs: Provides a way to run different operating systems on the same host machine and in turn run many applications in fully isolated environments.
So it turns out a container is similar to a process, similar to a VM, but also not similar to a process and a VM, it depends on what you are looking at.
References
- UNIX and Linux System Administration Handbook (5th Edition).
- Wes Higbee’s course: Containers and Images: The Big Picture .
- The Linux man pages: namespaces, cgroups, and capabilities.
- Linux Programming Interface book.
- Nigel Poulton’s course: The Big Picture and Docker Deep Dive.