Alan Zhan Blog

Live for nothing, or die for something

I was recently organizing Docker knowledge for myself and came across Dockerfile best practices. I planned to summarize them, but then I found that the official documentation already has best practices, so I decided to translate them and add my own insights.

Containers Should Be Ephemeral

Containers built from a Dockerfile should be as ephemeral as possible. “Ephemeral” here means they can be started quickly and stopped quickly.

Understand the Build Context

Including files that aren’t needed to build the image results in a larger build context and larger image. This increases build time, pull/push time, and the runtime size of the container.

  • Pay attention to the context directory during builds, which is typically . (the current directory).
  • Make good use of .dockerignore, which we’ll cover in more detail below.

Use .dockerignore Files

The usage is similar to .gitignore in Git. For Git, .gitignore prevents unnecessary files from being committed to the repository. Similarly, .dockerignore ensures the build context is as small as possible.

Pipe Dockerfile Through stdin

If your image is temporary and you don’t want to bother writing a Dockerfile, you can build images through stdin piping.

echo -e 'FROM busybox\nRUN echo "hello world"' | docker build -
docker build -<<EOF
FROM busybox
RUN echo "hello world"
EOF

Use Multi-Stage Builds

Using multi-stage builds can dramatically reduce your final image size, and you won’t need to worry about optimizing the Dockerfile statements in non-final images.

Here’s a Go example: during the build process, we need the SDK image to compile the binary. After compilation, we only need to place the binary in an extremely lightweight image, ensuring the final image stays lean.

# build stage
FROM golang:alpine AS build
RUN apk --no-cache add build-base git bzr mercurial gcc
ADD . /src
RUN cd /src && go build -o main

# final stage
FROM alpine
WORKDIR /app
COPY --from=build /src/main /app/
ENTRYPOINT ./main

Minimize Installing Unnecessary Packages

To reduce complexity, dependencies, file size, and build time, avoid installing unnecessary software packages whenever possible.

Run Only One Process Per Container

Each container should have only one concern, so ideally only one process should run per container. Decouple other applications into separate containers as much as possible.

However, things aren’t always that simple — sometimes multiple processes do run in a single container. In that case, here are some recommended process management tools:

  • supervisord: An easy-to-use process management tool
  • tini: Docker’s default process manager
  • systemd: A comprehensive solution, but relatively heavyweight
  • s6: A well-known process management tool

Minimize the Number of Image Layers

In older versions of Docker, minimizing the number of layers in an image was crucial for performance:

  • Only the RUN, COPY, and ADD instructions create layers. Other instructions create temporary intermediate images and don’t increase the build size.
  • When possible, leverage multi-stage builds (see the Use Multi-Stage Builds section above).

Sort Multi-Line Arguments

Sort multi-line arguments alphabetically — for example, when installing multiple packages. This avoids installing duplicate packages and makes the package list easier to maintain when updating.

RUN apt-get update && apt-get install -y \
  bzr \
  cvs \
  git \
  mercurial \
  subversion \
  && rm -rf /var/lib/apt/lists/*

Leverage Build Cache

During the image build process, Docker iterates through the instructions in the Dockerfile and executes them sequentially. Before each instruction is executed, Docker checks the cache for a reusable duplicate image. If one exists, it uses the cached image instead of creating a duplicate.

You can disable caching by using the --no-cache=true option in the docker build command.

  • Starting from the cached parent image, the next instruction is compared against all child images of that parent to check whether any of them were created using the exact same instruction. If not, the cache is invalidated.
    • Note: Instructions and files that are least likely to change should be executed first to maximize cache reuse.
  • In most cases, simply comparing the instruction in the Dockerfile with the child image is sufficient.
  • For ADD and COPY instructions, the contents of the corresponding files in the image are also checked. A checksum is calculated for each file (though last modified time and last access time are not included in the checksum). During cache lookup, these checksums are compared with checksums of files in existing images. If files have changed, the cache is invalidated.
  • Apart from ADD and COPY instructions, the cache matching process does not examine files in temporary containers to determine cache matches. For example, when running RUN apt-get -y update, some files in the container are updated, but Docker does not check these files — it records the instruction string to match the cache.

References

Dockerfile Best Practices

Feel free to leave a comment on my blog. Your feedback motivates me to keep writing. Thank you for reading, and let’s grow together to become better versions of ourselves.