最近针对目前的pipeline 有2个优化:
- 把deployer base image 单独分出来,在这一阶段安装一些general common packages.
- 把base image flavor 从 Alpine 换到 Debian.
对于1, 很容易理解,因为每次CI/CD pipeline 都会rebuild deployer image,大量的packages 安装工作比较耗时,如果大部分工作都在 base image中安装好了,剩余的工作量就少很多。
对于2,以前我没有想到这一点,怎么选择合适的base image 以及为什么需要切换base image呢?首先从Python image的各种类型开始了解(不同的Application 情况不同,这里只讨论Python).
Python Image Variants There are 4 types:
- python:[version]
- python:[version]-slim
- python:[version]-alpine
- python:[version]-windowsservercore
前2种都是Debian-based, 只是内部安装的packages 数量不同,slim 只安装了运行Python所需的最小packages数量. 关于Debain release code name, such as buster, stretch. See this reference.
可以查看使用的Python image Dockerfile 了解哪些packages 已经安装,避免重复,比如slim Dockerfile.
至于为什么对于Python image需要从Alpine 切换到 Debian呢?这篇文章总结到了: Using Alpine can make Python Docker builds 50× slower. 针对我们使用的Python image, 我也做了比较,确实debian based image对于PIP requirements.txt 安装要更快,但随着Alpine base的更新以后情况可能会改变。
这篇文章所在网站的其他内容也很有参考价值:
- Best Docker Base Image for Python Applciation 2021
- Overall articles list for Python production-ready practices
Other good articles:
- A Comparison Of Linux Container Images,这篇article的table很不错。
Dockerfile Best Practice
1 | # --no-cache: do not rely on build cache |
Make sure do not include unnecessary files in your build context, that will result in larger image size. Or using .dockeringore
to exclude files from build context.
Pipe in dockerfile, no files will be sent to build context:
1 | # cannot use COPY in this way |
Omitting the build context can be useful in situations where your Dockerfile does not require files to be copied into the image, and improves the build-speed, as no files are sent to the daemon.
Multi-Stage builds allow you to drastically reduce the size of your final image, without struggling to reduce the number of intermediate layers and files. For example, the Elasticserach curator Dockerfile also adopt this workflow:
- Install tools you need to build your application
- Install or update library dependencies
- Generate your application
1 | # the simplest base image |
To reduce complexity, dependencies, file sizes, and build times, avoid installing extra or unnecessary packages just because they might be “nice to have.” For example, you don’t need to include a text editor in a database image.
Only the instructions RUN
, COPY
, ADD
create layers. Other instructions create temporary intermediate images, and do not increase the size of the build.
Sort multi-line arguments, for example debian:
1 | # Always combine RUN apt-get update with apt-get install in the same RUN |
Dockerfile Clause
LABEL
can be used to filter image with with -f
option in docker images
command.
Using pipe:
1 | # Docker executes these commands using the /bin/sh -c interpreter |
CMD
should rarely be used in the manner of CMD ["param", "param"]
in conjunction with ENTRYPOINT
, unless you and your expected users are already quite familiar with how ENTRYPOINT works. CMD
should almost always be used in the form of CMD ["executable", "param1", "param2"…]
.
Use ENTRYPOINY
with docker-entrypoint.sh
helper script is also common:
1 | COPY ./docker-entrypoint.sh / |
Each ENV
line creates a new intermediate layer, just like RUN
commands. This means that even if you unset the environment variable in a future layer, it still persists in this layer and its value can be dumped. To prevent this, and really unset the environment variable, use a RUN
command with shell commands, to set, use, and unset the variable all in a single layer. You can separate your commands with ; or &&:
1 | # syntax=docker/dockerfile:1 |
Although ADD
and COPY
are functionally similar, generally speaking, COPY
is preferred. If multiple files need to be COPY
, copy them separately in use rather than all in one go, this can help to invalidate the cache.
Because image size matters, using ADD
to fetch packages from remote URLs is strongly discouraged; you should use curl
or wget
instead. That way you can delete the files you no longer need after they’ve been extracted and you don’t have to add another layer in your image.
You are strongly encouraged to use VOLUME
for any mutable and/or user-serviceable parts of your image. (I rarely use)
Avoid installing or using sudo as it has unpredictable TTY and signal-forwarding behavior that can cause problems. If you absolutely need functionality similar to sudo, such as initializing the daemon as root but running it as non-root, consider using gosu
.
Lastly, to reduce layers and complexity, avoid switching USER
back and forth frequently.
For clarity and reliability, you should always use absolute paths for your WORKDIR
.
Think of the ONBUILD
command as an instruction the parent Dockerfile gives to the child Dockerfile.