I keep rereading the Docker documentation to try to understand the difference between Docker and a full VM. How does it manage to provide a full filesystem, isolated networking environment, etc. without being as heavy?
Why is deploying software to a Docker image (if that's the right term) easier than simply deploying to a consistent production environment?
I have used Docker in production environments and staging very much. When you get used to it you will find it very powerful for building a multi container and isolated environments.
Docker has been developed based on LXC (Linux Container) and works perfectly in many Linux distributions, especially Ubuntu.
Docker containers are isolated environments. You can see it when you issue the
top
command in a Docker container that has been created from a Docker image.Besides that, they are very light-weight and flexible thanks to the dockerFile configuration.
For example, you can create a Docker image and configure a DockerFile and tell that for example when it is running then wget 'this', apt-get 'that', run 'some shell script', setting environment variables and so on.
In micro-services projects and architecture Docker is a very viable asset. You can achieve scalability, resiliency and elasticity with Docker, Docker swarm, Kubernetes and Docker Compose.
Another important issue regarding Docker is Docker Hub and its community. For example, I implemented an ecosystem for monitoring kafka using Prometheus, Grafana, Prometheus-JMX-Exporter, and Dokcer.
For doing that I downloaded configured Docker containers for zookeeper, kafka, Prometheus, Grafana and jmx-collector then mounted my own configuration for some of them using yml files or for others I changed some files and configuration in the Docker container and I build a whole system for monitoring kafka using multi-container Dockers on a single machine with isolation and scalability and resiliency that this architecture can be easily moved into multiple servers.
Besides the Docker Hub site there is another site called quay.io that you can use to have your own Docker images dashboard there and pull/push to/from it. You can even import Docker images from Docker Hub to quay then running them from quay on your own machine.
Note: Learning Docker in the first place seems complex and hard, but when you get used to it then you can not work without it.
I remember the first days of working with Docker when I issued the wrong commands or removing my containers and all of data and configurations mistakenly.
Good answers. Just to get an image representation of container vs VM, have a look at the one below.
Source
Most of the answers here talk about virtual machines. I'm going to give you a one-liner response to this question that has helped me the most over the last couple years of using Docker. It's this:
Now, let me explain a bit more about what that means. Virtual machines are their own beast. I feel like explaining what Docker is will help you understand this more than explaining what a virtual machine is. Especially because there are many fine answers here telling you exactly what someone means when they say "virtual machine". So...
A Docker container is just a process (and its children) that is compartmentalized using cgroups inside the host system's kernel from the rest of the processes. You can actually see your Docker container processes by running
ps aux
on the host. For example, startingapache2
"in a container" is just startingapache2
as a special process on the host. It's just been compartmentalized from other processes on the machine. It is important to note that your containers do not exist outside of your containerized process' lifetime. When your process dies, your container dies. That's because Docker replacespid 1
inside your container with your application (pid 1
is normally the init system). This last point aboutpid 1
is very important.As far as the filesystem used by each of those container processes, Docker uses UnionFS-backed images, which is what you're downloading when you do a
docker pull ubuntu
. Each "image" is just a series of layers and related metadata. The concept of layering is very important here. Each layer is just a change from the layer underneath it. For example, when you delete a file in your Dockerfile while building a Docker container, you're actually just creating a layer on top of the last layer which says "this file has been deleted". Incidentally, this is why you can delete a big file from your filesystem, but the image still takes up the same amount of disk space. The file is still there, in the layers underneath the current one. Layers themselves are just tarballs of files. You can test this out withdocker save --output /tmp/ubuntu.tar ubuntu
and thencd /tmp && tar xvf ubuntu.tar
. Then you can take a look around. All those directories that look like long hashes are actually the individual layers. Each one contains files (layer.tar
) and metadata (json
) with information about that particular layer. Those layers just describe changes to the filesystem which are saved as a layer "on top of" its original state. When reading the "current" data, the filesystem reads data as though it were looking only at the top-most layers of changes. That's why the file appears to be deleted, even though it still exists in "previous" layers, because the filesystem is only looking at the top-most layers. This allows completely different containers to share their filesystem layers, even though some significant changes may have happened to the filesystem on the top-most layers in each container. This can save you a ton of disk space, when your containers share their base image layers. However, when you mount directories and files from the host system into your container by way of volumes, those volumes "bypass" the UnionFS, so changes are not stored in layers.Networking in Docker is achieved by using an ethernet bridge (called
docker0
on the host), and virtual interfaces for every container on the host. It creates a virtual subnet indocker0
for your containers to communicate "between" one another. There are many options for networking here, including creating custom subnets for your containers, and the ability to "share" your host's networking stack for your container to access directly.Docker is moving very fast. Its documentation is some of the best documentation I've ever seen. It is generally well-written, concise, and accurate. I recommend you check the documentation available for more information, and trust the documentation over anything else you read online, including Stack Overflow. If you have specific questions, I highly recommend joining
#docker
on Freenode IRC and asking there (you can even use Freenode's webchat for that!).They both are very different. Docker is lightweight and uses LXC/libcontainer (which relies on kernel namespacing and cgroups) and does not have machine/hardware emulation such as hypervisor, KVM. Xen which are heavy.
Docker and LXC is meant more for sandboxing, containerization, and resource isolation. It uses the host OS's (currently only Linux kernel) clone API which provides namespacing for IPC, NS (mount), network, PID, UTS, etc.
What about memory, I/O, CPU, etc.? That is controlled using cgroups where you can create groups with certain resource (CPU, memory, etc.) specification/restriction and put your processes in there. On top of LXC, Docker provides a storage backend (http://www.projectatomic.io/docs/filesystems/) e.g., union mount filesystem where you can add layers and share layers between different mount namespaces.
This is a powerful feature where the base images are typically readonly and only when the container modifies something in the layer will it write something to read-write partition (a.k.a. copy on write). It also provides many other wrappers such as registry and versioning of images.
With normal LXC you need to come with some rootfs or share the rootfs and when shared, and the changes are reflected on other containers. Due to lot of these added features, Docker is more popular than LXC. LXC is popular in embedded environments for implementing security around processes exposed to external entities such as network and UI. Docker is popular in cloud multi-tenancy environment where consistent production environment is expected.
A normal VM (for example, VirtualBox and VMware) uses a hypervisor, and related technologies either have dedicated firmware that becomes the first layer for the first OS (host OS, or guest OS 0) or a software that runs on the host OS to provide hardware emulation such as CPU, USB/accessories, memory, network, etc., to the guest OSes. VMs are still (as of 2015) popular in high security multi-tenant environment.
Docker/LXC can almost be run on any cheap hardware (less than 1 GB of memory is also OK as long as you have newer kernel) vs. normal VMs need at least 2 GB of memory, etc., to do anything meaningful with it. But Docker support on the host OS is not available in OS such as Windows (as of Nov 2014) where as may types of VMs can be run on windows, Linux, and Macs.
Here is a pic from docker/rightscale :
This is how Docker introduces itself:
So Docker is container based, meaning you have images and containers which can be run on your current machine. It's not including the operating system like VMs, but like a pack of different working packs like Java, Tomcat, etc.
If you understand containers, you get what Docker is and how it's different from VMs...
So, what's a container?
So as you see in the image below, each container has a separate pack and running on a single machine share that machine's operating system... They are secure and easy to ship...
1. Lightweight
This is probably the first impression for many docker learners.
First, docker images are usually smaller than VM images, makes it easy to build, copy, share.
Second, Docker containers can start in several milliseconds, while VM starts in seconds.
2. Layered File System
This is another key feature of Docker. Images have layers, and different images can share layers, make it even more space-saving and faster to build.
If all containers use Ubuntu as their base images, not every image has its own file system, but share the same underline ubuntu files, and only differs in their own application data.
3. Shared OS Kernel
Think of containers as processes!
All containers running on a host is indeed a bunch of processes with different file systems. They share the same OS kernel, only encapsulates system library and dependencies.
This is good for most cases(no extra OS kernel maintains) but can be a problem if strict isolations are necessary between containers.
Why it matters?
All these seem like improvements, not revolution. Well, quantitative accumulation leads to qualitative transformation.
Think about application deployment. If we want to deploy a new software(service) or upgrade one, it is better to change the config files and processes instead of creating a new VM. Because Creating a VM with updated service, testing it(share between Dev & QA), deploying to production takes hours, even days. If anything goes wrong, you got to start again, wasting even more time. So, use configuration management tool(puppet, saltstack, chef etc.) to install new software, download new files is preferred.
When it comes to docker, it's impossible to use a newly created docker container to replace the old one. Maintainance is much easier!Building a new image, share it with QA, testing it, deploying it only takes minutes(if everything is automated), hours in the worst case. This is called immutable infrastructure: do not maintain(upgrade) software, create a new one instead.
It transforms how services are delivered. We want applications, but have to maintain VMs(which is a pain and has little to do with our applications). Docker makes you focus on applications and smooths everything.