I was recently organizing Docker knowledge for myself and came across Dockerfile best practices. I planned to summarize them, but then I found that the official documentation already has best practices, so I decided to translate them and add my own insights.
Containers Should Be Ephemeral
Containers built from a Dockerfile should be as ephemeral as possible. “Ephemeral” here means they can be started quickly and stopped quickly.
Understand the Build Context
Including files that aren’t needed to build the image results in a larger build context and larger image. This increases build time, pull/push time, and the runtime size of the container.
- Pay attention to the context directory during builds, which is typically
.(the current directory). - Make good use of .dockerignore, which we’ll cover in more detail below.
Use .dockerignore Files
The usage is similar to .gitignore in Git. For Git, .gitignore prevents unnecessary files from being committed to the repository. Similarly, .dockerignore ensures the build context is as small as possible.
Pipe Dockerfile Through stdin
If your image is temporary and you don’t want to bother writing a Dockerfile, you can build images through stdin piping.
echo -e 'FROM busybox\nRUN echo "hello world"' | docker build -
docker build -<<EOF
FROM busybox
RUN echo "hello world"
EOF
Use Multi-Stage Builds
Using multi-stage builds can dramatically reduce your final image size, and you won’t need to worry about optimizing the Dockerfile statements in non-final images.
Here’s a Go example: during the build process, we need the SDK image to compile the binary. After compilation, we only need to place the binary in an extremely lightweight image, ensuring the final image stays lean.
# build stage
FROM golang:alpine AS build
RUN apk --no-cache add build-base git bzr mercurial gcc
ADD . /src
RUN cd /src && go build -o main
# final stage
FROM alpine
WORKDIR /app
COPY --from=build /src/main /app/
ENTRYPOINT ./main
Minimize Installing Unnecessary Packages
To reduce complexity, dependencies, file size, and build time, avoid installing unnecessary software packages whenever possible.
Run Only One Process Per Container
Each container should have only one concern, so ideally only one process should run per container. Decouple other applications into separate containers as much as possible.
However, things aren’t always that simple — sometimes multiple processes do run in a single container. In that case, here are some recommended process management tools:
- supervisord: An easy-to-use process management tool
- tini: Docker’s default process manager
- systemd: A comprehensive solution, but relatively heavyweight
- s6: A well-known process management tool
Minimize the Number of Image Layers
In older versions of Docker, minimizing the number of layers in an image was crucial for performance:
- Only the
RUN,COPY, andADDinstructions create layers. Other instructions create temporary intermediate images and don’t increase the build size. - When possible, leverage multi-stage builds (see the
Use Multi-Stage Buildssection above).
Sort Multi-Line Arguments
Sort multi-line arguments alphabetically — for example, when installing multiple packages. This avoids installing duplicate packages and makes the package list easier to maintain when updating.
RUN apt-get update && apt-get install -y \
bzr \
cvs \
git \
mercurial \
subversion \
&& rm -rf /var/lib/apt/lists/*
Leverage Build Cache
During the image build process, Docker iterates through the instructions in the Dockerfile and executes them sequentially. Before each instruction is executed, Docker checks the cache for a reusable duplicate image. If one exists, it uses the cached image instead of creating a duplicate.
You can disable caching by using the
--no-cache=trueoption in thedocker buildcommand.
- Starting from the cached parent image, the next instruction is compared against all child images of that parent to check whether any of them were created using the exact same instruction. If not, the cache is invalidated.
- Note: Instructions and files that are least likely to change should be executed first to maximize cache reuse.
- In most cases, simply comparing the instruction in the
Dockerfilewith the child image is sufficient. - For
ADDandCOPYinstructions, the contents of the corresponding files in the image are also checked. A checksum is calculated for each file (though last modified time and last access time are not included in the checksum). During cache lookup, these checksums are compared with checksums of files in existing images. If files have changed, the cache is invalidated. - Apart from
ADDandCOPYinstructions, the cache matching process does not examine files in temporary containers to determine cache matches. For example, when runningRUN apt-get -y update, some files in the container are updated, but Docker does not check these files — it records the instruction string to match the cache.
References
Feel free to leave a comment on my blog. Your feedback motivates me to keep writing. Thank you for reading, and let’s grow together to become better versions of ourselves.