Docker Base Image Choice

最近针对目前的pipeline 有2个优化:

  1. 把deployer base image 单独分出来,在这一阶段安装一些general common packages.
  2. 把base image flavor 从 Alpine 换到 Debian.

对于1, 很容易理解,因为每次CI/CD pipeline 都会rebuild deployer image,大量的packages 安装工作比较耗时,如果大部分工作都在 base image中安装好了,剩余的工作量就少很多。

对于2,以前我没有想到这一点,怎么选择合适的base image 以及为什么需要切换base image呢?首先从Python image的各种类型开始了解(不同的Application 情况不同,这里只讨论Python).

Python Image Variants There are 4 types:

  • python:[version]
  • python:[version]-slim
  • python:[version]-alpine
  • python:[version]-windowsservercore

前2种都是Debian-based, 只是内部安装的packages 数量不同,slim 只安装了运行Python所需的最小packages数量. 关于Debain release code name, such as buster, stretch. See this reference.

可以查看使用的Python image Dockerfile 了解哪些packages 已经安装,避免重复,比如slim Dockerfile.

至于为什么对于Python image需要从Alpine 切换到 Debian呢?这篇文章总结到了: Using Alpine can make Python Docker builds 50× slower. 针对我们使用的Python image, 我也做了比较,确实debian based image对于PIP requirements.txt 安装要更快,但随着Alpine base的更新以后情况可能会改变。

这篇文章所在网站的其他内容也很有参考价值:

Other good articles:

Dockerfile Best Practice

Dockerfile Best Practices.

1
2
3
4
# --no-cache: do not rely on build cache
# -f: specify dockerfile
# context: build context location, usually .(current dir)
docker build --no-cache -t helloapp:v2 -f dockerfiles/Dockerfile <context path>

Make sure do not include unnecessary files in your build context, that will result in larger image size. Or using .dockeringore to exclude files from build context.

Pipe in dockerfile, no files will be sent to build context:

1
2
3
4
5
6
7
8
# cannot use COPY in this way
# -: read Dockerfilr from stdin
echo -e 'FROM busybox\nRUN echo "hello world"' | docker build -
# here document
docker build -<<EOF
FROM busybox
RUN echo "hello world"
EOF

Omitting the build context can be useful in situations where your Dockerfile does not require files to be copied into the image, and improves the build-speed, as no files are sent to the daemon.

Multi-Stage builds allow you to drastically reduce the size of your final image, without struggling to reduce the number of intermediate layers and files. For example, the Elasticserach curator Dockerfile also adopt this workflow:

  • Install tools you need to build your application
  • Install or update library dependencies
  • Generate your application
1
2
# the simplest base image
FROM scratch

To reduce complexity, dependencies, file sizes, and build times, avoid installing extra or unnecessary packages just because they might be “nice to have.” For example, you don’t need to include a text editor in a database image.

Only the instructions RUN, COPY, ADD create layers. Other instructions create temporary intermediate images, and do not increase the size of the build.

Sort multi-line arguments, for example debian:

1
2
3
4
5
6
7
8
9
10
# Always combine RUN apt-get update with apt-get install in the same RUN
# otherwise apt-get update clause will be skipped in rebuild if no --no-cache
RUN apt-get update && apt-get install -y \
bzr \
cvs \
git \
mercurial \
subversion \
&& rm -rf /var/lib/apt/lists/*
# clean up the apt cache by removing /var/lib/apt/lists it reduces the image size

Dockerfile Clause

LABEL can be used to filter image with with -f option in docker images command.

Using pipe:

1
2
3
4
5
# Docker executes these commands using the /bin/sh -c interpreter
RUN set -o pipefail && wget -O - https://some.site | wc -l > /number

# or explicitly specify shell to support -o pipefail
RUN ["/bin/bash", "-c", "set -o pipefail && wget -O - https://some.site | wc -l > /number"]

CMD should rarely be used in the manner of CMD ["param", "param"] in conjunction with ENTRYPOINT, unless you and your expected users are already quite familiar with how ENTRYPOINT works. CMD should almost always be used in the form of CMD ["executable", "param1", "param2"…].

Use ENTRYPOINY with docker-entrypoint.sh helper script is also common:

1
2
3
4
5
COPY ./docker-entrypoint.sh /
ENTRYPOINT ['/docker-entrypoint.sh']
# will be substituted with command in docker run end
# docker run --it --rm image_name:tag <param1> <param2> ...
CMD ["--help"]

Each ENV line creates a new intermediate layer, just like RUN commands. This means that even if you unset the environment variable in a future layer, it still persists in this layer and its value can be dumped. To prevent this, and really unset the environment variable, use a RUN command with shell commands, to set, use, and unset the variable all in a single layer. You can separate your commands with ; or &&:

1
2
3
4
5
6
# syntax=docker/dockerfile:1
FROM alpine
RUN export ADMIN_USER="mark" \
&& echo $ADMIN_USER > ./mark \
&& unset ADMIN_USER
CMD sh

Although ADD and COPY are functionally similar, generally speaking, COPY is preferred. If multiple files need to be COPY, copy them separately in use rather than all in one go, this can help to invalidate the cache.

Because image size matters, using ADD to fetch packages from remote URLs is strongly discouraged; you should use curl or wget instead. That way you can delete the files you no longer need after they’ve been extracted and you don’t have to add another layer in your image.

You are strongly encouraged to use VOLUME for any mutable and/or user-serviceable parts of your image. (I rarely use)

Avoid installing or using sudo as it has unpredictable TTY and signal-forwarding behavior that can cause problems. If you absolutely need functionality similar to sudo, such as initializing the daemon as root but running it as non-root, consider using gosu.

Lastly, to reduce layers and complexity, avoid switching USER back and forth frequently.

For clarity and reliability, you should always use absolute paths for your WORKDIR.

Think of the ONBUILD command as an instruction the parent Dockerfile gives to the child Dockerfile.

0%