Why are Containers so big?
Something that I’ve noticed while pulling containers from the Docker Hub on a slow connection is that most containers are needlessly massive. Why is this so? If a container is designed to run a single process, why aren’t containers the size of the package or binary download for the app they contain?
For example, the official nginx
Debian packages are under 3MB, but the official nginx Docker
image for nginx
is almost 100MB!
Let’s take a look at why that is by examining the Dockerfile
:
FROM debian:wheezy
MAINTAINER NGINX Docker Maintainers "[email protected]"
RUN apt-key adv –keyserver pgp.mit.edu –recv-keys 573BFD6B3D8FBC641079A6ABABF5BD827BD9BF62
RUN echo “deb http://nginx.org/packages/mainline/debian/ wheezy nginx” » /etc/apt/sources.list
ENV NGINX_VERSION 1.7.10-1~wheezy
RUN apt-get update && \
apt-get install -y ca-certificates nginx=${NGINX_VERSION} && \
rm -rf /var/lib/apt/lists/*
# (etc…)
The problem starts with the very first line, FROM debian:wheezy
. We’re going to be running a single process inside
our container, so why have we suddenly imported the whole of Debian wheezy, an 85MB
image? I can only think of 2
reasons, the familiarity of using apt to install images, and the fact that you might need to enter your container to
debug it, so it’s nice to have dash
, bash
and the GNU core utilities available.
Luckily, the problem of a tiny, yet still usable Linux userland has been solved already. Busybox, originally designed for embedded devices, contains most commonly-user utilities compiled into a single binary, weighing in at around 800KB. You can create a Busybox container yourself using buildroot, or use one fo the Busybox images on the Docker hub, but none of them contain a good package manager, so their usefulness is a bit limited. Until now that is…
The Alpine Linux Solution
Alpine Linux is a minimal, Busybox-based Linux distribution which has been adapted for use
inside Docker containers by Glider Labs amongst others. What sets it apart from the other Busybox images is it has a
decent set of packages built in, meaning you can build containers as easily as the Debian one above. Let’s recreate the
official nginx
image using alpine:
FROM gliderlabs/alpine
MAINTAINER LiveWyer
RUN apk-install ca-certificates nginx
EXPOSE 80 443
CMD [“nginx”, “-g”, “daemon off;"]
That was easy, and the resulting container is just 8MB, much more like it!
DevOps Engineer Summary
One counter-argument to building small containers is, “Due to Docker’s layered filesystem, you can reuse the base image in all your containers, so it doesn’t matter that it’s fairly large.” However, in my experience every organisation creates their own “base” image, usually derived from Debian or Ubuntu, meaning that your docker image storage directory is soon littered with large redundant images, meaning you just have to make your own base image. So why not make it alpine based?
Another argument is “I have a fast connection and loads of disk, why should I care about a few 100MBs?” I assume if you’re using docker, you’re going to be deploying large numbers of containers across a cloud. Saving 100MB per container should drastically improve your deploy times, especially given that the docker hub is sometimes rather slow!
Try slimming down your containers, and see if you notice any benefits!