Best Practices

Create ephemeral containers

The image defined by your Dockerfile should generate containers that are as ephemeral as possible. By ephemeral, we mean that the container can be stopped and destroyed, then rebuilt and replaced with an absolute minimum set up and configuration.

Refer to Processes under The Twelve-factor App methodology to get a feel for the motivations of running containers in such a stateless fashion.

Understand build context

When you issue a docker build command, the current working directory is called the build context. By default, the Dockerfile is assumed to be located here, but you can specify a different location with the file flag (-f). Regardless of where the Dockerfile actually lives, all recursive contents of files and directories in the current directory are sent to the Docker daemon as the build context.

Pipe Dockerfile through stdin

Docker has the ability to build images by piping Dockerfile through stdin with a local or remote build context. Piping a Dockerfile through stdin can be useful to perform one-off builds without writing a Dockerfile to disk, or in situations where the Dockerfile is generated, and should not persist afterwards. For example, the following commands are equivalent:

echo -e 'FROM busybox\nRUN echo "hello world"' | docker build -

docker build -<<EOF
FROM busybox
RUN echo "hello world"
EOF

Exclude with .dockerignore

To exclude files not relevant to the build (without restructuring your source repository) use a .dockerignore file. This file supports exclusion patterns similar to .gitignore files.

Before the docker CLI sends the context to the docker daemon, it looks for a file named .dockerignore in the root directory of the context. If this file exists, the CLI modifies the context to exclude files and directories that match patterns in it. This helps to avoid unnecessarily sending large or sensitive files and directories to the daemon and potentially adding them to images using ADD or COPY.

Use multi-stage builds

Multi-stage builds allow you to drastically reduce the size of your final image, without struggling to reduce the number of intermediate layers and files. Because an image is built during the final stage of the build process, you can minimize image layers by leveraging build cache.

For example, if your build contains several layers, you can order them from the less frequently changed (to ensure the build cache is reusable) to the more frequently changed:

Install tools you need to build your application
Install or update library dependencies
Generate your application

Don’t install unnecessary packages

To reduce complexity, dependencies, file sizes, and build times, avoid installing extra or unnecessary packages just because they might be “nice to have.” For example, you don’t need to include a text editor in a database image.

Decouple applications

Each container should have only one concern. Decoupling applications into multiple containers makes it easier to scale horizontally and reuse containers. For instance, a web application stack might consist of three separate containers, each with its own unique image, to manage the web application, database, and an in-memory cache in a decoupled manner.

Limiting each container to one process is a good rule of thumb, but it is not a hard and fast rule. For example, not only can containers be spawned with an init process, some programs might spawn additional processes of their own accord. For instance, Celery can spawn multiple worker processes, and Apache can create one process per request.

Use your best judgment to keep containers as clean and modular as possible. If containers depend on each other, you can use Docker container networks to ensure that these containers can communicate.

Minimize the number of layers

In older versions of Docker, it was important that you minimized the number of layers in your images to ensure they were performant. The following features were added to reduce this limitation:

Only the instructions RUN, COPY, ADD create layers. Other instructions create temporary intermediate images, and do not increase the size of the build.

Where possible, use multi-stage builds, and only copy the artifacts you need into the final image. This allows you to include tools and debug information in your intermediate build stages without increasing the size of the final image.

Sort multi-line arguments

Whenever possible, ease later changes by sorting multi-line arguments alphanumerically. This helps to avoid duplication of packages and make the list much easier to update. This also makes PRs a lot easier to read and review. Adding a space before a backslash () helps as well.

Here’s an example from the buildpack-deps image:

RUN apt-get update && apt-get install -y \
  bzr \
  cvs \
  git \
  mercurial \
  subversion \
  && rm -rf /var/lib/apt/lists/*

Do not use 'latest' base image tag

latest tag is a default one, used when no other tag is specified. So our instruction FROM ubuntu in reality does exactly the same as FROM ubuntu:latest. But 'latest' tag will point to a different image when a new version will be released, and your build may break. So, unless you are creating a generic Dockerfile that must stay up-to-date with the base image, provide specific tag.

In our example, let's use 16.04 tag:

FROM ubuntu:16.04  # it's that easy!

RUN apt-get update && apt-get install -y nodejs 
ADD . /app
RUN cd /app && npm install

CMD npm start

Remove unneeded files after each RUN step

So, let's assume we updated apt-get sources, installed few packages required for compiling others, downloaded and extracted archives. We obviously don't need them in our final images, so better let's make a cleanup. Size matters!

In our example we can remove apt-get lists (created by apt-get update):

FROM ubuntu:16.04

RUN apt-get update \
    && apt-get install -y nodejs \
    # added lines
    && rm -rf /var/lib/apt/lists/*

ADD . /app
RUN cd /app && npm install

CMD npm start

Use proper base image

In our example, we used ubuntu. But why? Do we really need a general-purpose base image, when we just want to run node application? A Better option is to use a specialized image with node already installed:

FROM node

ADD . /app
# we don't need to install node 
# anymore and use apt-get
RUN cd /app && npm install

CMD npm start

Or even better, we can choose alpine version (alpine is a very tiny linux distribution, just about 4 MB in size. This makes it perfect candidate for a base image)

FROM node:7-alpine

ADD . /app
RUN cd /app && npm install

CMD npm start

Alpine has package manager, called apk. It's a bit different than apt-get, but still quite easy to learn. Also, it has some really useful features, like --o-cache and --virtual options.

FROM Instruction

Whenever possible, use current official images as the basis for your images. We recommend the Alpine image as it is tightly controlled and small in size (currently under 5 MB), while still being a full Linux distribution.

Alpine Linux is a Linux distribution built around musl libc and BusyBox. The image is only 5 MB in size and has access to a package repository that is much more complete than other BusyBox based images. This makes Alpine Linux a great image base for utilities and even production applications. Read more about Alpine Linux here and you can see how their mantra fits in right at home with Docker images.

RUN Instruction

Split long or complex RUN statements on multiple lines separated with backslashes to make your Dockerfile more readable, understandable, and maintainable.

Always combine RUN apt-get update with apt-get install in the same RUN statement. For example:

RUN apt-get update && apt-get install -y \
    package-bar \
    package-baz \
    package-foo  \
    && rm -rf /var/lib/apt/lists/*

Prefer COPY over ADD

Although ADD and COPY are functionally similar, generally speaking, COPY is preferred. That’s because it’s more transparent than ADD. COPY only supports the basic copying of local files into the container, while ADD has some features (like local-only tar extraction and remote URL support) that are not immediately obvious. Consequently, the best use for ADD is local tar file auto-extraction into the image, as in ADD rootfs.tar.xz /.

If you have multiple Dockerfile steps that use different files from your context, COPY them individually, rather than all at once. This ensures that each step’s build cache is only invalidated (forcing the step to be re-run) if the specifically required files change.

For example:

COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
COPY . /tmp/

Add HEALTHCHECK

We can start docker container with option --restart always. After container crash, docker daemon will try to restart it. It's very useful if your container has to be operational all the time. But what if container is running, but not available (infinite loop, invalid configuration etc)?. With HEALTHCHECK instruction we can tell Docker to periodically check our container health status. It can be any command, returning 0 exit code if everything is OK, and 1 in other case.

FROM node:7-alpine
...

ENV APP_PORT=3000

EXPOSE $APP_PORT
HEALTHCHECK CMD curl --fail http://localhost:$APP_PORT || exit 1
...

LABEL Instruction

There's an option to add metadata to the image, such as information who is the maintainer or extended description. We need LABEL instruction for that (previously we could use MAINTAINER option, but now it's deprecated). Metadata is sometimes used by external programs, for example nvidia-docker require com.nvidia.volumes.needed label to work properly.

Example of a metadata in our Dockerfile:

FROM node:7-alpine
LABEL maintainer "john.doe@example.com"
...

Use scripts

Sometimes it makes sense to break out a part of a Dockerfile into a shell script. For instance, while it’s good to clean up after installing a package through a package manager, it can get awkward to have a long && chain in a RUN command just to enforce cleanliness.

Exercise

Write a docker file
- nginx as base image.
- copy index.html to /usr/share/nginx/html/
- copy system-status.html to /usr/share/nginx/html/
- RUN apt-get update && apt-get install -y wget
- HEALTHCHECK CMD wget -q --method=HEAD localhost/system-status.html
run the container with mapping the port 80 of the container to the port 8080 of the host
check if the container is running by checking port 8080
check the status of containers with docker ps
delete the file: docker exec docker-health rm -rf /usr/share/nginx/html/system-status.html
check the status of containers with docker ps