Docker Multi-process

应用容器,一般来说只有一个main serivce process(it can spawn child processes). 需要一个 init process(PID 1) 去管理 children reaping, handle signals, 也就是说, 如果你的 service process 有 fork 但是 no reaping,那么你就需要一个 init process 了,否则会造成 zombie process.

特别是在容器中使用第三方 app 的时候,不清楚对方是否会产生 child processes 或者 reaping, 所以最好使用 init process, see this articale and what is the advantage of tini.

Exploration

这里一篇文章关于 run multiple services in docker, the options could be:

  • using --init, it is docker-init process backed by tini.
1
2
# ps aux can see docker-init
docker run --init -itd nginx
  • using wrapper script, for example, entrypoint script.
  • main process along with temporary processes, set job control in wrapper script.
  • install dedicated init process and config them, for example, supervisord, tini, dumb-init, etc.

For catching signals and child process reaping, if not using tini or other dedicated init process, you need to write code by yourself.

init Process

可以看看container commonly used init processes:

For tini, using steps:

1
2
3
4
5
6
# in alpine docker
RUN apk add --no-cache tini
# tini is now available at /sbin/tini
ENTRYPOINT ["/sbin/tini", "--"]
# or
ENTRYPOINT ["/sbin/tini", "--", "/docker-entrypoint.sh"]

How tini Proxies Signal

之前看了一篇关于 Linux delivery signal 之于 container init process 的文章,提到了在 container 中 kill 1 的操作为什么有时会失败,然后讲了什么时候 kernel 会把信号推送到 init process,以及什么时候不会。这篇文章只提到了源码的一部分,也就是 init process(SIGNAL_UNKILLABLE) + non-default signal handler + current namespace, see the second if condition: https://github.com/torvalds/linux/blob/a76c3d035872bf390d2fd92d8e5badc5ee28b17d/kernel/signal.c#L79-L99

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
static bool sig_task_ignored(struct task_struct *t, int sig, bool force)
{
void __user *handler;

handler = sig_handler(t, sig);

/* SIGKILL and SIGSTOP may not be sent to the global init */
if (unlikely(is_global_init(t) && sig_kernel_only(sig)))
return true;

/***** see this condition *****/
if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) &&
handler == SIG_DFL && !(force && sig_kernel_only(sig)))
return true;
/*******/

/* Only allow kernel generated signals to this kthread */
if (unlikely((t->flags & PF_KTHREAD) &&
(handler == SIG_KTHREAD_KERNEL) && !force))
return true;

return sig_handler_ignored(handler, sig);
}

The emphasis is on sigcgt bitmask, this is correct as docker has documented here: https://docs.docker.com/engine/reference/run/#foreground

1
A process running as PID 1 inside a container is treated specially by Linux: it ignores any signal with the default action. As a result, the process will not terminate on SIGINT or SIGTERM unless it is coded to do so.

也就是说,用户如果在 init process 注册了 SIGTERM handler(sigcgt bit set to 1) 那么 handler == SIG_DFL is false,所以 init process 就可以收到了.

但问题是我查看 tini init process signal bitmask sigcgt is 0 for all fields, 所以 kernel 甚至都不会把信号传递过去, so how come the tini forwards signal if no signal would be delivered at all? I have opened a question regarding this.

From the author’s comment, I know The way Tini catches signals is by blocking all signals that should be forwarded to the child, and then waiting for them via sigtimedwait. If takes a closer look at the caller of sig_task_ignored: https://github.com/torvalds/linux/blob/a76c3d035872bf390d2fd92d8e5badc5ee28b17d/kernel/signal.c#L101-L120

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
static bool sig_ignored(struct task_struct *t, int sig, bool force)
{
/*
* Blocked signals are never ignored, since the
* signal handler may change by the time it is
* unblocked.
*/
if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig))
return false;

/*
* Tracers may want to know about even ignored signal unless it
* is SIGKILL which can't be reported anyway but can be ignored
* by SIGNAL_UNKILLABLE task.
*/
if (t->ptrace && sig != SIGKILL)
return false;

return sig_task_ignored(t, sig, force);
}

You will see Blocked signals are never ignored! So tini will always receive the signals from kernel.

关于 tini main loop 中的 sigtimedwait 其实就是block execution 等待信号的到来 https://github.com/krallin/tini/blob/378bbbc8909a960e89de220b1a4e50781233a740/src/tini.c#L501-L514

1
2
3
4
5
6
7
8
9
10
11
12
13
14
int wait_and_forward_signal(sigset_t const* const parent_sigset_ptr, pid_t const child_pid) {
siginfo_t sig;

if (sigtimedwait(parent_sigset_ptr, &sig, &ts) == -1) {
switch (errno) {
case EAGAIN:
break;
case EINTR:
break;
default:
PRINT_FATAL("Unexpected error in sigtimedwait: '%s'", strerror(errno));
return 1;
}
} else {

可以参考 signal(7) section Synchronously accepting a signal 关于 sigtimedwait 的讲解.

还可以通过 sudo strace -p <pid> 去观察 tini 是如何 forward signal 的:

1
2
3
4
5
6
7
8
9
10
11
# tini is init process
# bash is child
docker run --name test -itd tini_image:latest bash

# find out pid in host namespace
ps -ef | grep bash

sudo strace -p <tini pid>
sudo strace -p <child bash pid>

docker stop test