Docker has been creating a lot of buzz in the development community lately. This article will compare Docker to traditional virtual machines and hopefully define its sweet spot in a development life cycle. This is written from a Docker newbie’s perspective and will highlight some valuable lessons and tricks I have learned over the past few months.
When I coach people on running crowdsourced code challenges on Topcoder, there is one critical aspect I always emphasize to achieve good participation: it’s a simple fact that the environmental setup should take no more than 5 minutes. Let me elaborate.
At any given time there may be hundreds of other challenges that members can choose from. If your challenge requires an hour to set up the environment before a developer can begin the solution, he/she will most likely move to the next challenge where they can dive in right away.
If I am running a simple MEAN.io, the setup instructions might be:
- git clone
- npm install
BAM! Thirty seconds later you’re up and running and ready to code your solution. It is very difficult to improve on this, and it might be hard to find a place for Docker in this scenario, but let’s expand on this example.
Let’s say you are a little further along in your life cycle and you want to split your web app with your api layer. Now you need to manage two code repositories and start them both and wire them together with env vars.
Let’s keep going and say you want some indexes created in Mongo and use Redis to manage sessions. O.k., better yet, let’s say you want to swap out Mongo for PostgreSQL and now you have schemas you need to import. If you sat at the front of class at code school, you might start to see the dilemma. Without Docker you might be able to orchestrate this to some extent with a task runner like Grunt or Gulp. If you came from the Ruby camp maybe you would use db_migrate to manage your database. You might even use foreman to start all your services. All these techniques are perfectly acceptable and we’ve been using them for many years but now Docker and Docker-compose (formerly Fig) offer and brand new paradigms.
What is Docker?
When you find a new tool and you visit its site it is sometimes difficult to weed through the marketing and understand what the tool really does. The following is my attempt to explain what Docker is from firsthand experience. I have never seen it explained this way so take it with a grain of salt. I will assume that you already understand the classic virtual machinery and I will compare Docker to what you already know.
In my opinion there are two major differences between Docker and a virtual machine like VMWare.
Difference 1: Fundamental distinction between an image and running a container
Typically when you use something like VMware, you start with an ISO of Ubuntu or maybe you shop around for a pre-built image that offers just what you need. You run that image, and it becomes a VM and you start to install packages, configure it and then use it.
At this point the original image becomes a distant memory. If you have done this before, you probably will clone the VM before you customize it too much, so you always have that virgin foundation in case you need to build a similar machine. This is best practice; however, it is inherently part of Docker’s DNA.
With Docker you build your image (more on that below) and you start it as a container. The image is always saved and preserved. When you start a container it runs a single command. This is a little hard to image first, but it is very complementary to the microservice strategy that we’ve talked so much about, because you can launch a separate container for each process.
For example, your container might be run with node web.js or mongod or redis-server or node api.js. You might even start a container to display a log file. By default a container state is persistent but you can use a –rm switch to make them ephemeral. Not every container needs to originate from an unique image.
For example, if you want to use a Cassandra cluster you might start three Cassandra containers from the same image and you will use the –link options so they can communicate with each other. You can restart stopped containers and the state (even the data) will be preserved or you can destroy the containers and start new ones. With just a few images you can build a full stack of services all running separate containers. You can even mount volumes from one container to another or from the host system. These are very powerful concepts; however they are a little more abstract than we are used to. Once again, I believe that this “container revolution” is driving the push toward microservice.
Difference 2: You are encouraged to build minimal images from layers of images rather than monolithic ones.
As I mentioned in the previous section, large monolithic images are a thing of the past and Docker encourages you to build images with a recipe-like convention stored in a Dockerfile. Then the command docker build -t myfirstcontainer . will create the image. All Dockerfiles start with the FROM directive. This is the base image found in the Docker hub. FROM Ubuntu:latest is a good place to start. You can use the COPY directive to copy code from your host to be used during the building process but more typically we would use the RUN directive to use the images OS to get the packages we want. For example:
RUN apt-get update RUN apt-get install mongodb RUN apt-get install ssh
This would give us a terse Ubuntu image that would include Mongodb and ssh. We then would add the default container command called an ENTRYPOINT so we could simply run the image to create a container that was running Mongo. The Entrypoint might look something like this: ENTRYPOINT /user/local/bin/mongod and when we started the container, Mongodb would be running. Of course, all your configuration would also be done during the build time using the ENV or RUN directive. Starting a container feels very different than starting a vm because there appears to be no boot process.
Tips on using Docker
Here are some lessons I have learned over the past few months that are worth sharing.
- docker images shows you images.
- docker ps shows your RUNNING containers (Understand the difference of these two concepts and commands since the output looks almost identical).
- docker ps -a shows ALL containers, even stopped ones (that can be restarted).
- docker run <image> starts a new container from an image running the default entry point.
- docker exec </your/command> <container> runs a command on an ALREADY RUNNING container. This is useful! For example: docker exec my_mongo_container /bin/ps will show the output of the running processes on your Mongo container and then the container will exit.
- Docker runs on Linux so you will need to have (if you are on a Mac) a Docker virtual machine called Boot2docker which is Virtualbox under the covers. You should download the Virtualbox GUI from Oracle so you can tweak this type 2 hypervisor. I know this is confusing but if you follow the instructions on docker.com it will make sense.
- Boot2docker creates a virtual network adaptor with the IP address of 192.168.59.103 but your containers will be on a virtual network that is 172.17.0.x. So you won’t be able to reach them from your host (unless you are on Linux already) All the documentation gives you instructions to port forward. However I found this simple trick to be more elegant: just add a route to your container network using your boot2docker host as a gateway with the following command on your Mac: sudo route add 172.17.0.0/24 192.168.59.103
- Once you have added the above route you can run docker inspect <containerId> to get the IP address and hit the container directly.
- I recommend you name your containers and images with the -t tag switch and add the word ‘container’ or ‘image’ to avoid confusion.
- Once you have a container you can convert it back to an image and preserve its state by using docker commit. For example I started with an Oracle image and ran it as a container. I then created the schemas in the container and loaded some sample data. I then committed this container as an image and uploaded it to Docker Hub. Now someone can do a docker pull kbowerma/sp5 and get an image that not only contains a working version of Oracle 11g on Ubuntu but also has my schema and sample data. This is super powerful!
- For the sake of brevity I have left out some important details you will need. This article is not a Docker how-to (there are plenty already) but is simply a conceptual comparison.
Docker Compose (fig)
Ok, this is really cool. Once you play with Docker and get a bunch of containers up and running and talking to each other, you will learn the commands to do this are somewhat lengthy and require a lot of switches. Docker Compose is a small binary that takes a YAML file that allows you to orchestrate a complex set of containers to interact with each other.
Below is a simple Docker-compose.yml file that will create and run three Cassandra containers as a cluster from a single image.
cassnode1: build: . links: - cassnode2 - cassnode3 hostname: cassnode1 command: -name "DataGuard Cluster" -seeds "cassnode1,cassnode2,cassnode3" cassnode2: build: . links: - cassnode3 hostname: cassnode2 command: -name "DataGuard Cluster" -seeds "cassnode1,cassnode2,cassnode3" cassnode3: build: . hostname: cassnode3 command: -name "DataGuard Cluster" -seeds "cassnode1,cassnode2,cassnode3"
It assumes it is in the same directory as the Dockerfile, hence the build . The really cool thing is that it will build the three containers and run them. Before it builds them it will check to see if they have already been built and if they are, it will just re-start them and link them together. If you don’t have them built as containers, it will look for the Dockerfile and build them. The first one may take a few minutes but the 2nd and 3rd will be almost instantaneous since they use the same image and all the layers are already present.
When an image is built from a Dockerfile, every command is called a ‘Step’ and Docker will accept a commit at the end of every step. This means that if you build an image with 10 steps, to get 10 packages via apt-get, it will make 10 internal commits. But then if you add one more command (Step) and run Docker build again, it will use the previous commits and will only take a few seconds to get that single new package.
If the development environment is simple and requires little setup, then Docker may be overkill; but once things start to become complex, Docker does a great job to simplify it again and make the replication of the development environment exact, easy and completely transportable. Docker should be in every developer’s toolbox even if it is down at the bottom next to the grout knife.